README.md 6.9 KB
Newer Older
Erwan Croze's avatar
Erwan Croze committed
1 2
[![pipeline status](https://gitlab.linphone.org/BC/public/belr/badges/master/pipeline.svg)](https://gitlab.linphone.org/BC/public/belr/commits/master)

3 4
What is Belr
============
François Grisez's avatar
François Grisez committed
5

Simon Morlat's avatar
Simon Morlat committed
6
Belr is Belledonne Communications' language recognition library, written in C++11.
7
It parses text inputs formatted according to a language defined by an ABNF grammar,
François Grisez's avatar
François Grisez committed
8
such as the protocols standardized at IETF.
9 10 11 12 13

It drastically simplifies the writing of a parser, provided that the parsed language is defined with an *ABNF grammar[1]*.
The parser automaton is automatically generated by belr library, in memory, from the ABNF grammar text.
The application developer is responsible to connect belr's parser with its custom code through callbacks in order to get
notified of recognized language elements.
14 15 16

It is based on finite state machine theory and heavily relies on recursivity from an implementation standpoint.

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
The benefits of using belr are:
- belr is safe: no handly written code to parse the language. No buffer overflow. No mistakes in ABNF interpretation.
- belr saves time: a lot of human efforts are eliminated because the parser is automatically generated.
- belr saves space: belr does not generate source code files to insert in your build process either. The parser automaton is created at runtime, in memory.
- belr is fast, as it was around 50% faster on parsing SIP URIs compared to antlr/antlr3c.
- belr is flexible: you are free to design your parser API as you want, and simply connect the parser automaton with your API.

License
=======

Copyright © [Belledonne Communications SARL](https://www.linphone.org), all rights reserved.

Belr is dual licensed:
- under a GNU GPLv3 license for free (see LICENSE.txt file for details)
- under a proprietary license, for closed source projects. Contact sales@belledonne-communications.com for costs and other service information.


How it works
============

Let's take a very basic example to understand.
Your application first needs to create a Grammar object from a text file contaning the ABNF grammar description:

```
ABNFGrammarBuilder builder;
// The grammar is constructed from sipgrammar.txt file, plus an additional built-in grammar called 'CoreRules',
// which is used by almost every grammar.
shared_ptr<Grammar> grammar=builder.createFromAbnfFile("sipgrammar.txt", make_shared<CoreRules>());
```

Then, from the grammar object returned, instanciate a parser object, by telling belr the name of a base class you have defined
to represent any element of the language. In the example below, it is called `SipElement`.
The parser object can be used as much time as needed. There is no no need to re-instanciate it each time you need to parse a new input !

```
ABNFGrammarBuilder builder;
Parser<shared_ptr<SipElement>> parser(grammar);
```

Now, you have to connect the parser with your own classes in order to have language elements filled into your objects.

```
parser.setHandler("SIP-URI", make_fn(&SipUri::create)) //tells that whenever a SIP-URI is found, a SipUri object must be created.
		->setCollector("user", make_sfn(&SipUri::setUsername)) //tells that when a "user" field is found, SipUri::setUsername() is to be called for assigning the "user"
		->setCollector("host", make_sfn(&SipUri::setHost)) //tells that when host is encountered, use SipUri::setHost() to assign it to our SipUri object.
		->setCollector("port", make_sfn(&SipUri::setPort));
```

Here, we have instructed our belr parser to invoke our `SipUri::create()` each time it recognizes a SIP-URI. This method must simply
return a new `SipUri` instance.
We also told him, that each time the `user` part of a SIP URI is recognized, the `SipUri::setUsername(const std::string& user)` method must be called
to fill the recognized user part into the created SipUri instance.
Similarly, we assign the `host` part with SipUri::setHost() method, and the `port` part with the SipUri::setPort() method.

Finally, you can now parse a SIP-URI: 

```
size_t parsedSize;
string inputToParse = "sip:bob@sip.example.org";
shared_ptr<SipElement> ret = parser.parseInput("SIP-URI", inputToParse , &parsedSize);
//if the sip uri is recognized, the return value is non null and you can cast it into a SipUri object.
if (ret){
	shared_ptr<SipUri> sipUri = dynamic_pointer_cast<SipUri>(ret);
	// Do what you want with the SipUri object...
}
```

The full example is in tools/belr-demo.cc.

One last thing to know. Grammar creation from text files requires many computations, which can slow down the startup of your application.
Fortunately, a solution exists: use the `belr-compiler` tool to generate a binary representation of the grammar, saved to disk and included 
as a resource in your application.
You can view the binary grammar as a kind of byte-code representing the language automaton.
Then your application can simply instanciate the `Grammar` object by loading it from disk. It is hundred times faster.
91

François Grisez's avatar
François Grisez committed
92 93 94
Dependencies
============

95
- *bctoolbox[2]*: our portability layer
François Grisez's avatar
François Grisez committed
96 97 98 99 100


Build Belr
==========

101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
	cmake . -DCMAKE_INSTALL_PREFIX=<prefix> -DCMAKE_PREFIX_PATH=<search_prefixes>
	
	make
	make install


Limitations
===========

Belr doesn't handle non-deterministic ABNF grammars. For example:
```
token       =  1*(alphanum / "-" / "." / "!" / "%" / "*"
my-element  =  token "!" 
```
The problem of this grammar is that "!" is part of `token`, and also the termination character `my-element`.
It is non-deterministic: when the automaton finds a "!", it can recognize it as a `token`, or as the last 
element of `my-element`.

Unfortunately it is not so rare to encounter this kind of situation.
Belr's current logic will be to include the "!" into `token`, all the time, because this is the first one that matches in the sequence.

The solution for this would be to have belr explore both possibilities, however this is not implemented as of today (2019-09-17).
Most of time, a workaround exists by re-writing the problematic grammar rule in such a way that it is no longer this ambiguity.
François Grisez's avatar
François Grisez committed
124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143


Build options
=============

* `CMAKE_INSTALL_PREFIX=<string>`: install prefix
* `CMAKE_PREFIX_PATH=<string>`: column-separated list of prefixes where to search for dependencies
* `ENABLE_SHARED=NO`: do not build the shared library
* `ENABLE_STATIC=NO`: do not build the static library
* `ENABLE_STRICT=NO`: build without strict compilation flags (-Wall -Werror)
* `ENABLE_TOOLS=NO`: do not build tools (belr-demo, belr-parse)


Note for packagers
==================

Our CMake scripts may automatically add some paths into research paths of generated binaries.
To ensure that the installed binaries are striped of any rpath, use `-DCMAKE_SKIP_INSTALL_RPATH=ON`
while you invoke cmake.

jehan's avatar
jehan committed
144 145 146 147 148 149 150 151 152
Rpm packaging
belr can be generated with cmake3 using the following command:
mkdir WORK
cd WORK
cmake3 ../
make package_source
rpmbuild -ta --clean --rmsource --rmspec belr-<version>-<release>.tar.gz


153

François Grisez's avatar
François Grisez committed
154
-----------------------
155

156 157
* [1] https://tools.ietf.org/html/rfc5234
* [2] git://git.linphone.org/bctoolbox.git or <http://www.linphone.org/releases/sources/bctoolbox>