Product
Service
News
Download
About Us
Site Map

User Guide of Leafsoft (TM) Leato

Version 0.42 alpha

(Note: please send any comment about this document to rfc-leato@leafsoft.com)

Thomas Yip
tomtictac-web at yahoo dot com

Leafsoft Co.
http://www.leafsoft.com

 

Note: It is a very preliminary user guide. Please bare with us with any technical error and spelling and grammatical mistake in this report. Thank you. 


Copyright (c) 2000 Thomas Yip, Inc. All Rights Reserved.
 
LEAFSOFT MAKES NO REPRESENTATIONS OR WARRANTIES ABOUT THE SUITABILITY OF THE SOFTWARE AND/OR THE DOCUMENT, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. LEAFSOFT SHALL NOT BE LIABLE FOR ANY DAMAGES SUFFERED BY LICENSEE AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THIS SOFTWARE AND/OR THE DOCUMENT OR ITS DERIVATIVES.


Introduction

Leafsoft's document fragmentation declaration language, Leato, was originally designed to describe XML fragments within Leafsoft Larix. Larix raises some specific requirements which current declaration specifications, for example DTD and Schema, do not address. Larix requires document declaration to be intuitive, compact and extensible. 

Leato is very compact, allows declaration of characters in Regular Expression and supports of non-deterministic sub elements. Also, Leato syntax compatible to XML format, thus declaration can be passed to Leato validator using any XML parsers without modification to parser. And, multiple declarations can be stored as or embedded in a single XML document. 

Leato isn't mean to be an replacement of DTD. It doesn't scale too well for very complicated declaration. But, it is very certainly handy for decelerating small fragments and suitable for variety of tasks which DTD and schema doesn't help.


Body

1.  Declaration of a fragment

1.1.  Define an element

Defining an element in Leato is straightforward. The element to be declared is just appearing as a regular XML element. For example, if we want to allow <ElementName> as the only element in our XML fragment structure, the declaration would be:

    <Leato>
        <ElementName/>
    </Leato>

 

1.2.  Define sub-elements

Compact Mode

Defining sub element is as simple. Consider the following example in compact mode:

    <Leato>
        <Name>
            <First>{#PCDATA}</First>
            <Last>{#PCDATA}</Last>
        </Name>
    </Leato>

It defines element <Name> allows sub-element <First> follow by a <Middle> and follow by a <Last>. Also, {#PCDATA} allows parsed character data as sub-element of <First> and <Last>. 

 

Expanded Mode

The example above can be rewritten in expanded mode as the following:

    <Leato>
        <Name>
            <First/>
            <Last/>
        </Name>

        <First>{#PCDATA}</First>!
        <Last>{#PCDATA}</Last>!
    </Leato>

It represents the same declaration as in the pervious example.

The excitation mark, "!", specified that preceding tag is not an sub-element of the element which embraces it (see section Modifier for detail). So, in our example, element <First> is not a valid element after <Name>. In fact, Leato allows only one root element, which is <Name> in the above example. Any element other than the first one will be processed as if it is modified by "!".

"!" is a very convenient device in Leato. Developer can now move declaration around to maximize readability. It allows, for example, some depth nested elements to be moved out and thus make the declaration easier to read. 

At most, only one occurrence of elements having the same name is non-empty in the declaration. In the actual fragment, all empty elements having the same name will carries all sub-elements and attribute from the non-empty one. The following declaration is consider invalid:

    <Leato>
        <People>
            <Employee>
                <Name>
                    <First/><Middle/><Last/>
                </Name>
            </Employee>
            <Customer>
                <Name>
                    <First/><Last/>
                </Name>
            </Customer>
        </People>

        <First>{#PCDATA}</First>!
        <Middle>{#PCDATA}</Middle>!
        <Last>{#PCDATA}</Last>!
    </Leato>

In fact, even if all the sub-elements of <Name> are the same, the declaration is still invalid. 

To correct the error, the declaration may rewrite as the following:

    <Leato>
        <People>
            <Employee><Name/></Employee>
            <Customer><Name/></Customer>
        </People>

        <Name>
            <First>{#PCDATA}</First>
            <Last>{#PCDATA}</Last>
        </Name>!
    </Leato>

There is no constraint on which element of the same name should contain the sub-element's declaration. The above example can be rewritten as the following:

    <Leato>
        <People>
            <Employee>
                <Name>
                    <First>{#PCDATA}</First>
                    <Last>{#PCDATA}</Last>
                </Name>
            </Employee>
            <Customer><Name/></Customer>
        </People>
    </Leato>

Though, it generally enhances readability if developer move out sub-element which occur inside more than one element. 

Compact mode and expanded mode can be mixed in justice of developer to maximize readability.

Unnamed Root Element

Leato allows the root element not be named. Unnamed root element declarated as <_>. Note that unnamed root element can have attribute declarations as usual root element. Consider the following example:

    <Leato>
        <_>
            <Name>
                <First>{#PCDATA}</First>
                <Last>{#PCDATA}</Last>
            </Name>
        </_>
    </Leato>

The above declaration allows different element with same sub-element. Both of the below fragments are valid for the above declaration:

example 1/

    <Employee>
        <Name>
            <First>Thomas</First>
            <Last>Yip</Last>
        </Name>
    </Employee>

example 2/

    <Customer>
        <Name>
            <First>Bob</First>
            <Last>Smith</Last>
        </Name>
    </Customer>

Modifier

Beside !, modifiers ?, * and + can be used to specify the occurrence of the element. It carries the usual meaning as in regular expression. 
    "?" specifies that the preceding tag is optional, that is, occur zero or one time. 
    "*" specified that the preceding tag can occur zero or more time, and 
    "+" specified that the preceding tag can occur one or more time. 

And again, 
    "!" specifies that the preceding tag wasn't an sub-element of the embraced element. 

And, there are one more modifier:
    "~" specifies that the preceding tag match any element, but having the same name.  

Modifier is always placed after the modified element, and after the end-element tag if the tag isn't singular. For example, 

    <Leato>
        <Employee>
            <Name>
                <First>{#PCDATA}</First>
                <Middle>{#PCDATA}</Middle>?
                <Last>{#PCDATA}</Last>
            </Name>
            <Remark/>*
        </Employee>
    </Leato>

The question mark, ?, specifies that the sub-element <Middle> is optional in element <Name>. And, the asterisk, *, specifies that there can be zero or more <Remark>. 

 

Sequence and Choice

There are two grouping patterns in Leato. Without being overridden, all sub-elements inside an element is treat as sequence. Sequence of <A/><B/><C/> matches an <A> followed by a <B> followed by a <C>. The default can be overridden with a pair square bracket, [ and ], around sub-elements. For example, 

    <Leato>
        <body>
            [<p/><h1/><h2/><h3/>]
        </body>
   
</Leato>

specifies that element <body> can have exactly one of <p>, <h1>, <h2>, or <h3/>. Choice work as "exclusive or". 

If sub-elements are embraced by a pair of round brackets, ( and ), it make the sub-element a sequence. Unlike regular expression, round brackets does not denote to back reference nor as memory. 

Grouping pattern of sequence and choice can be nested inside each other. Grouping pattern can be modified by modifier ?, * and + and !. But, it must not enclose no element nor elements or groups which all has modifier ! behind. 

Grouping pattern must be opened and closed in the same level. For example, the following is invalid:

    <Leato>
        <Element>
            [<Subelement>
                 <Subsubelement>]
             </Subelement>
        </Element>
    </Leato>       

 

Deterministic vs. non-deterministic

Leato only support non-deterministic declaration of sub-element occurrence. However, developer should aware that according to XML specification, "XML processors built using SGML systems may flag non-deterministic content models as errors." 

And, Leato allows element have itself as sub-element or one of the sub-element.

Reference 

Element can be declared to have the same set of attribute and sub-element as other. The declaration is done by reference tag. Reference tag is similar to regular tag, but have curly brackets instead of triangular brackets. Reference will be the only sub-element of the super element and it can not contains sub-sub-element nor attribute. For example, we can rewrite our employee and customer example to look like that:

    <Leato>
        <People>
            <Employee>{Person/}</Employee>
            <Customer>{Person/}</Customer>
        </People>

        <Person>
            <Name>
                <First>{#PCDATA}</First>
                <Middle>{#PCDATA}</Middle>?
                <Last>{#PCDATA}</Last>
            </Name>
        </Person>!
    </Leato>

Element must not have any sub-element which is a reference of itself.

 

Special element

All special elements are enclosed in a pair curly brackets but, unlike reference, it start with a # sign or ~ sign. Some special element will ignore the modifier. In the current implementation, all special character shouldn't contains entity or notation which will break character data into pieces. This limitation may be removed in release version. All special element with # sign can be together with other sub-element declaration. But, special element with ~ sign can be the only sub-element of its super-element.

Consider the special element {#PCDATA} in the following example,

    <Leato>
        <OL>
            (<Li/>{#PCDATA})*
        </OL>
    </Leato>

Notice that {#PCDATA} may use together with other sub-element under <OL>.

 

{~ALL}

~ALL match one or more element, both declared or undeclared, characters, or nothing. In fact, if element with it special element, its sub-elements not verified for validness, if any.

{#PCDATA}

The most famous special element is {#PCDATA}, which stands for Parsed Characters Data. It has same meaning as PCDATA in XML specification. 

{()}

Syntax:

    {('a' | 'b' | 'c' | 'd')}

{#Re}

PCDATA which follow match regular expression.

Syntax:

    {#Re regular expression}

match character data which specified as value of "exp".

{#EMPTY}

With {#EMPTY} as sub-element, it is like declaring singular element or element with nothing between start and end tags. But, it can be used to help developer to make sure that element is not declared to have sub-element somewhere else. 

{#ANY}

#ANY matches #PCDATA, one or more declared elements or no element at all

{#ELEM}

#ELEM matches exactly one declared element

{#ELEM?}

#ELEMS matches one or more declared element

{#ELEM*}

#ELEM* matches none or one element

{#ELEM+}

#ELEM+ matches any number of element

{#Leato}

matches Leato declaration

{#Integer}

Syntax:

    {#INTEGER '[0, 100)'}
[ ... ] -- inclusive
( ... ) -- exclusive

match Integer, as defined in Java. Range is optional.

Also, range "[,200)" is allowed. And, it will match any integer from Integer.MIN_VALUE to 200.

{#Long}

Syntax:

    {#Long '[0, 100)'}
[ ... ] -- inclusive
( ... ) -- exclusive

match long integer, as defined in Java. Range is optional.

{#Float}

Syntax:

    {#Float '[0, 100)'}

{#Double}

Syntax:

    {#Double '[0, 100)'}

Similar as #Integer. If min or max range is not specified, it will be replaced by -Float.MAX_VALUE or Float.MAX_VALUE respectively.

{#Anon}

annotation holder

Syntax:

    {#anon 'CDATA'}

Have no effect on declaration. 

 

Attribute declarations

Attribute declaration consists 4 parts. The first part is the attribute name. The second part is an equal sign. And, the third part is type of attribute. And, the last part is the default. Unsurprisingly, attribute declaration is place after the declared element name within a pair of angular brackets. Consider the following example:

Attribute type
CDATA

    <Leato>
      <AnElement attr="#CDATA value"/>
  </Leato>

The above declaration allows the root element <AnElement> to have attribute named, "attr" and the value type is CDATA. And, the default value is "value".

 
()

Beside CDATA, attribute type can be enumeration. Consider the following example:

    <Leato>
      <AnElement attr="(valueA|valueB|valueC) #REQUIRED"/>
  </Leato>

In the above example, attribute "attr" can have either "valueA", "valueB" or "valueC" as its value. And, the attribute is required. 

 
Re

Also, an attribute may defined using regular expression. Consider the following example:

  <Leato>
      <AnElement email="Re [._-]+(\@[._-]+) #IMPLIES"/>
  </Leato>

 

Integer,  Long,  Float and Double

  <Leato>
      <AnElement attr="Float (-1,1) #FILL 0"/>
  </Leato>

Numerical type can be declared starting with keyword Integer, Long, Float and Double. The range is optional.

#Fill is an type which not exist in DTD. If the attr does not exist in the XML, Leato will add the attribute with the specified value. 

 

Validating XML with Leato

Leato can work with a SAX parser. It act as a middle layer between non-validating parser and a SAX DocumentHandler. The following is code fragment to get leato work in such configuration.

public void main( String[] args ) {
    // set up validating rule and get an verifier
    Tree leatoRule = Tree.createTree(new FileReader(args[0]));
    Leato l = new Leato( leatoRule );
    Verifier v = l.getVerifier();

    // SAX parser of you choice, syntax may vary with different parsers
    // set documentHandler and errorHandler to leato verifier
    Parser p = new Parser( false /* do not do validating check */ );
    p.setDocumentHandler( v );
    p.setErrorHandler( v );

    // set up the verifier 
    v.setDocumentHandler( xmlTarget );
    v.setErrorHandler( errorTarget );

    // start parsing
    p.parse();

Possible extensions of Leato

Query

The following example combine XML XPath and Leato to do a query. A list of <item> which satisfied with Leato declaration will be return.

<Query>
    <Select xpath="/descendant::olist/child::item">
          <item code="#re \bca061">
                  <color>{(while|yellow)}</color>
                  <size>{#PCDATA}</size>
                  <price cnd="#Float (0,2000] #REQUIRED"/>
          </item>
     </Select>
</Query>

Validating an XML fragment in a Java Program

public class AnXMLApp extends HttpServlet {

    private DFD oldApplet = new DFD(
          "<Transaction>" + 
              "<Param name="#CDATA #REQUIRED">{#PCDATA}</Param>+" + 
          "</Transaction>" );

    private DFD newApplet = new DFD(
          "<Transaction>" + 
              "<Sales>{#PCDATA}</Sales>" + 
              "<Date>{#PCDATA}</Date>" + 
              "<Customer id="{#PCDATA}"/>" + 
              "<Detail>{#PCDATA}</Detail>" + 
          "</Transaction>" );

    public void doPost (HttpServletRequest request, 
            HttpServletResponse response)
            throws ServletException, IOException {

        // set content type and other response header fields first
        response.setContentType("text/html");

        // get the communication channel with the requesting client
        Tree input = Tree.createTree(new InputStreamReader(request.getInputStream()) );
        if ( newApplet.getVerifier().isValid( input ) ) {

             // do the transcation
               .....

        } else if ( oldApplet.getVerifier().isValid( input ) ) {

             // notify the client to reload the applet....
               .....

        } else {

             // transaction invalid, ignore it
               .....

        }
    }
}

Current Implementation of Leato validator

     [...later........]

 

Add your own data type

Leato implements in Java. Special element can be easily added using Java. For example, developer want to make a special element called "{#Email}", which will allows either string email address string or element <Email>. The Class DFD will be extended if user want to add such special element.

class EmailNode extends com.leafsoft.leato.NDFA {
    public MorePowerfulDFD(Tree t ) {
        super( t );
    }
    protected EmailNode createSpecial( String special ) {
        if ( special.startsWith( "#Email") {
            return new EmailNode( special );
        }
        return super.createSpecial(special);
    }
}

And, a new class extends EmailNode will be needed,

public class EmailNode extends AbstractDFDNode {
    DNFA next;
    int modifier;
    public EmailNode( DFD root ) {
    }
    public void doEmpty( DFDNodeSet resultSet ) {
        resultSet.add(this);
    }
    public void doElement( DFDNodeSet resultSet, String name ) {
        if ( name.equals(Email) ) {
           resultSet.add(this);
        }
    }
    public void doCharacters( DFDNodeSet resultSet, char[] ch, int start, int length ) {
        String email = new String( ch, start, length );
        if ( email.matches("[^@]@([^@\.].)+") ) {
            resultSet.add( next );
        }
    }
    public void doIgnorableWhitespace( DFDNodeSet resultSet, String chars ) {
        resultSet.add(this);
    }
    public void doProcessingInstruction( DFDNodeSet resultSet, String target, String data ) {
        resultSet.add(this);
    }
    public String getSymbol() {
        return "{#Email}";
    }
    public boolean isFinalState() {
        return false;
    }
    void addNode( AbstractDFDNode toBeAdded ) {
    }
    void setModifier( int mod ) {
        modifier = mod;
    }
    void close() {
    }
    void setNext( NDFA next ) {
        this.next = next;
    }
}

Please refer to API document for detail.

Appendix 

 

EBNF grammar of Leato

EBNF Grammar for Document Fragment Declaration (Leato)

        [still working...............]


Leato ::= Elem

Elems ::= ( '[' Elems ']' | '(' Elems ')' ) ( '?' | '*' | '+' | '#' )?

Elems ::= Elem +

Elem ::= ( SpecElem | RefElem | '<' ElemName '/>' | '<' ElemName Attrs '>' Elems '</' ElemName '>' )

SpecElem ::= '{' ( '#ANY' | '#Leato' | '#ELEM' | '#ELEMS' |'#CDATA' | '#ALLSCOPE' | '#RE' reg ) '/'? }'

RefElem ::= '{' NMTOKEN '/'? '}'

ElemName ::= NMTOKEN

Attrs ::= Attr *

Attr ::= AttrName '=' '"' attype S Default '"'

Default ::= 'default' | 'implies'

AttrName ::= NMTOKEN

attype ::= 'CDATA' | 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' | 'NMTOKEN' | 'NMTOKENS' | Notation | enum | re

enum ::= 'NOTATION' enum

enum ::= '(' NMTOKEN ( | NMTOKEN )* ')'

re ::= 'RE' rexp

Leato in Leato

        [Later.............]

XHTML in Leato

        [Later.............]