Natural language interface
(description of workflow)

The main advantage of this interface over traditional menu-oriented interfaces is a possibility to retrieve the information fast and efficiently by submitting a single phrase in natural language instead of tedious selection of menu elements. The natural language interface is flexible and adaptive both to changes in the knowledge domain and user's views. It provides a possibility to formulate arbitrary queries in various languages (in English and in Russian, currently) to the FlyEx database.

To formulate and execute queries to the database the HTML form Natural Language Interface is to be filled by a user (see Fig. 1).

Figure 1. Input query form of Natural Language Interface
( Go to the same page in Natural Language Interface )

The text of a query is entered in the text field QUERY. The list QUERY EXAMPLES contains a set of predefined standard queries for convenience. On selection of a query from the list this query is displayed automatically in the field QUERY and may be edited before execution.

By default a query returns all rows, which satisfy to selection criterion. The text field MAX. NUMBER OF ROWS RETURNED allows to define a number of rows which will be returned as a result of a query. If this number is smaller than the number of rows indicated in the field, all rows will be returned.

The checkboxes "Search in" allow to specify the databases. It is possible to select any combination of databases. Note, that while both Mooshka and FlyEx provide access to data on expression of segmentation genes in Drosophila blastoderm, they were developed using relational technology and technology of multidimensional databases, and differ in their contents.

Selection of the link SWITCH TO RUSSIAN calls the Russian version of the query form. The queries in Russian are submitted and executed similarly to queries in English (see Fig. 2).

Figure 2. The Russian language version of the query form of Natural Language Interface
( Go to the same page in Natural Language Interface )

By pressing the button SEND QUERY a query will be executed and after a while a result of the query will appear in a new browser window. In the upper part of this window the query in natural language is displayed, in which words used to retrieve the information from the database are shown in red. The SQL query, automatically generated by the system, is presented below the NL query. SQL query can be edited and returned to the server by pressing the button SEND QUERY. The query result is displayed as a table. Figure 3 presents the result of the query "Which embryos were scanned for expression of Kruppel, giant and even-skipped?".

Figure 3. The result of the query "Which embryos were scanned for expression of Kruppel, giant and even-skipped?"
( Go to the same page in Natural Language Interface )

To formulate a NL query a user can use any concept described in the conceptual scheme. Both queries which use higher level or lower level concepts are interpreted by the system equally well. For example, the cleavage cycles from 11 to 13 are parts of developmental stage 4 and the query `What embryos belong to stage 4?' returns a list of embryos belonging to these cycles.

To formulate a NL query a user may type the words in any word form (for example, embryo or embryos, gene or genes). The query can be formulated both as a whole phrase or as a list of keywords. For instance, the query `embryos Kr gt eve' will return the same result as the query `Which embryos were scanned for expression of Kruppel, giant and even-skipped?'

The queries can be formulated using synonyms or even laboratory jargon. For example, a gene name can be introduced both as a full name or a symbol. To retrieve the information about a gene group one can equally use terms `gene group' and `combination of genes' or even abbreviation in the form of three capital letters, each of which corresponds to the common notation of the gene (e.g. `BHE' - group scanned for expression of bcd, gt and eve).

The query `How many ...?' allows to count rows satisfying any criterion (e.g., .How many embryos are scanned for expression of bcd and belong to late temporal classes?'(see see Fig. 4). Besides, it is possible to list these rows on a screen (for example, `How many fluorophores were used to detect expression of eve? List these fluorophores.'(see Fig. 5).

Figure 4. The result of the query "How many embryos are scanned for expression of bcd and belong to late temporal classes?"
( Go to the same page in Natural Language Interface )

Figure 5. The result of the query "How many embryos are scanned for expression of bcd and belong to late temporal classes? List these embryos."
( Go to the same page in Natural Language Interface )

One of the important features of the system consists in a possibility of arbitrary assignment of conditions posed on values of numerical attributes. The following combinations of semantic constructions are supported: larger than, greater than, more than, >, > =, <, less than, smaller than, from n to m, n - m.

It is possible to combine selection criteria in a query using logical operators AND, OR, NOT. For example, the query `Which embryos were scanned for expression of Kruppel and giant and even-skipped?' returns a list of embryos, which were scanned for expression of all these genes (see Fig. 3), while the query `Which embryos were scanned for expression of Kruppel or giant or even-skipped?' returns a list of embryos scanned for expression of at least one of these genes (see Fig. 6).

Figure 6. The result of the query "Which embryos were scanned for expression of Kruppel or giant or even-skipped?"
( Go to the same page in Natural Language Interface )

The query `Display pattern ...' returns a pattern of segmentation gene expression. When several patterns in different embryos are requested, an embryo list is displayed, in which embryo names are linked to embryo images (see Fig. 7). If a pattern in one embryo is requested, it will be displayed at once, instead of the embryo list. If the pattern of only one gene is requested a single stained image is displayed (see Fig. 8), otherwise a multiple stained image is presented to the user (see Fig. 9).

Figure 7. The result of the query "Display patterns of expression of Kr in embryos belonging to temporal class 2."
( Go to the same page in Natural Language Interface )

Figure 8. The result of the query "Display patterns of expression of Kr in embryo cq7."
( Go to the same page in Natural Language Interface )

Figure 9. The result of the query "Patterns of expression of all genes scanned in embryo cq7."
( Go to the same page in Natural Language Interface )

Quantitative and processed expression data can be displayed to a user in different formats. To retrieve this information a user has to specify a desired format of a query , e.g. `Select as a flat graph a quantitative data on expression of Kruppel in embryos belonging to temporal class 3'. The quantitative gene expression data selected from the FlyEx database can be presented to the user as a table (see Fig. 10), flat graph or 3-D graph.

Figure 10. The result of the query "Display quantitative gene expression data for embryo cq7 as a table."
( Go to the same page in Natural Language Interface )

Registered gene expression data are presented to the user as a table or as a graph. FlyEx stores two sets of registered data obtained by two different registration methods: SpA or FRDWT.

Back to the Natural Language Interface home page