What is Belr
Belr is Belledonne Communications' language recognition library, written in C++11. It parses text inputs formatted according to a language defined by an ABNF grammar, such as the protocols standardized at IETF.
It drastically simplifies the writing of a parser, provided that the parsed language is defined with an ABNF grammar[1]. The parser automaton is automatically generated by belr library, in memory, from the ABNF grammar text. The application developer is responsible to connect belr's parser with its custom code through callbacks in order to get notified of recognized language elements.
It is based on finite state machine theory and heavily relies on recursivity from an implementation standpoint.
The benefits of using belr are:
- belr is safe: no handly written code to parse the language. No buffer overflow. No mistakes in ABNF interpretation.
- belr saves time: a lot of human efforts are eliminated because the parser is automatically generated.
- belr saves space: belr does not generate source code files to insert in your build process either. The parser automaton is created at runtime, in memory.
- belr is fast, as it was around 50% faster on parsing SIP URIs compared to antlr/antlr3c.
- belr is flexible: you are free to design your parser API as you want, and simply connect the parser automaton with your API.
License
Copyright © Belledonne Communications SARL, all rights reserved.
Belr is dual licensed:
- under a GNU GPLv3 license for free (see LICENSE.txt file for details)
- under a proprietary license, for closed source projects. Contact sales@belledonne-communications.com for costs and other service information.
How it works
Let's take a very basic example to understand. Your application first needs to create a Grammar object from a text file contaning the ABNF grammar description:
ABNFGrammarBuilder builder;
// The grammar is constructed from sipgrammar.txt file, plus an additional built-in grammar called 'CoreRules',
// which is used by almost every grammar.
shared_ptr<Grammar> grammar=builder.createFromAbnfFile("sipgrammar.txt", make_shared<CoreRules>());
Then, from the grammar object returned, instanciate a parser object, by telling belr the name of a base class you have defined
to represent any element of the language. In the example below, it is called SipElement
.
The parser object can be used as much time as needed. There is no no need to re-instanciate it each time you need to parse a new input !
ABNFGrammarBuilder builder;
Parser<shared_ptr<SipElement>> parser(grammar);
Now, you have to connect the parser with your own classes in order to have language elements filled into your objects.
parser.setHandler("SIP-URI", make_fn(&SipUri::create)) //tells that whenever a SIP-URI is found, a SipUri object must be created.
->setCollector("user", make_sfn(&SipUri::setUsername)) //tells that when a "user" field is found, SipUri::setUsername() is to be called for assigning the "user"
->setCollector("host", make_sfn(&SipUri::setHost)) //tells that when host is encountered, use SipUri::setHost() to assign it to our SipUri object.
->setCollector("port", make_sfn(&SipUri::setPort));
Here, we have instructed our belr parser to invoke our SipUri::create()
each time it recognizes a SIP-URI. This method must simply
return a new SipUri
instance.
We also told him, that each time the user
part of a SIP URI is recognized, the SipUri::setUsername(const std::string& user)
method must be called
to fill the recognized user part into the created SipUri instance.
Similarly, we assign the host
part with SipUri::setHost() method, and the port
part with the SipUri::setPort() method.
Finally, you can now parse a SIP-URI:
size_t parsedSize;
string inputToParse = "sip:bob@sip.example.org";
shared_ptr<SipElement> ret = parser.parseInput("SIP-URI", inputToParse , &parsedSize);
//if the sip uri is recognized, the return value is non null and you can cast it into a SipUri object.
if (ret){
shared_ptr<SipUri> sipUri = dynamic_pointer_cast<SipUri>(ret);
// Do what you want with the SipUri object...
}
The full example is in tools/belr-demo.cc.
One last thing to know. Grammar creation from text files requires many computations, which can slow down the startup of your application.
Fortunately, a solution exists: use the belr-compiler
tool to generate a binary representation of the grammar, saved to disk and included
as a resource in your application.
You can view the binary grammar as a kind of byte-code representing the language automaton.
Then your application can simply instanciate the Grammar
object by loading it from disk. It is hundred times faster.
Dependencies
- bctoolbox[2]: our portability layer
Build Belr
cmake . -DCMAKE_INSTALL_PREFIX=<prefix> -DCMAKE_PREFIX_PATH=<search_prefixes>
make
make install
Limitations
Belr doesn't handle non-deterministic ABNF grammars. For example:
token = 1*(alphanum / "-" / "." / "!" / "%" / "*"
my-element = token "!"
The problem of this grammar is that "!" is part of token
, and also the termination character my-element
.
It is non-deterministic: when the automaton finds a "!", it can recognize it as a token
, or as the last
element of my-element
.
Unfortunately it is not so rare to encounter this kind of situation.
Belr's current logic will be to include the "!" into token
, all the time, because this is the first one that matches in the sequence.
The solution for this would be to have belr explore both possibilities, however this is not implemented as of today (2019-09-17). Most of time, a workaround exists by re-writing the problematic grammar rule in such a way that it is no longer this ambiguity.
Build options
-
CMAKE_INSTALL_PREFIX=<string>
: install prefix -
CMAKE_PREFIX_PATH=<string>
: column-separated list of prefixes where to search for dependencies -
ENABLE_STRICT=NO
: build without strict compilation flags (-Wall -Werror) -
ENABLE_TOOLS=NO
: do not build tools (belr-demo, belr-parse)
Note for packagers
Our CMake scripts may automatically add some paths into research paths of generated binaries.
To ensure that the installed binaries are striped of any rpath, use -DCMAKE_SKIP_INSTALL_RPATH=ON
while you invoke cmake.
Rpm packaging belr can be generated with cmake3 using the following command: mkdir WORK cd WORK cmake3 ../ make package_source rpmbuild -ta --clean --rmsource --rmspec belr--.tar.gz