lexor.core package¶
The core of lexor is divided among the modules in this package.
node: | Provides the most basic structure to create the document object model (DOM). |
---|---|
elements: | Here we define the basic structures to handle the information provided in files. Make sure to familiarize yourself with all the objects in this module to be able to write extensions for the Parser, Converter and Writer. |
parser: | The parser module provides the Parser and the abstract class NodeParser which helps us write derived objects for future languages to parse. |
converter: | The converter module provides the Converter and the abstract class NodeConverter which helps us copy a Document we want to convert to another language. |
writer: | The writer module provides the Writer and the abstract class NodeWriter which once subclassed help us tell the Writer how to write a Node to a file object. |
lexor.core.parser module¶
Parser Module
Provides the Parser object which defines the basic mechanism for parsing character sequences. This involves using objects derived from the abstract class NodeParser.
- class lexor.core.parser.NodeParser(parser)[source]¶
Bases: object
An object that has two methods: makeNode and close. The first method is required to be overloaded in derived objects.
- close(_)[source]¶
This method needs to be overloaded if the node parser returns a Node with the make_node method.
This method will not get called if make_node returned a Node inside a list. The close function takes as input the Node object that make_node returned and it should decide if the node can be closed or not. If it is indeed time to close the Node then return a list with the position where the Node is being closed, otherwise return None.
If this method is not overloaded then a NotImplementedError exception will be raised.
- make_node()[source]¶
This method is required to be overloaded by the derived node parser. It returns None if the node parser will not be able to create a node from the current information in the parser. Otherwise it creates a Node object and returns it.
When returning a node you have the option of informing the parser if the node is complete or not. For instance, if your node parser creates an Element and it does not have any children to be parsed then return a list containing only the single node. This will tell the parser that the node has been closed and it will not call the close method of the node parser. If the Node does not have a child, say ProcessingInstruction, RawText, or Void then there is no need to wrap the node in a list.
The Node object that this method returns also needs to have the property pos. This is a list of two integers stating the line and column number where the node was encountered in the text that is being parsed. This property will be removed by the parser once the parser finishes all processing with the node.
If this method is not overloaded as previously stated then a NotImplementedError exception will be raised.
- class lexor.core.parser.Parser(lang='xml', style='default', defaults=None)[source]¶
Bases: object
To see the languages that it is able to parse see the lexor.lang module.
- caret_position[source]¶
The index in the text the parser is processing. You may use the attribute access caret if performance is an issue.
- cdata[source]¶
The character sequence data that was last processed by the parse method. You may use the attribute access text if performance is an issue.
- compute(index)[source]¶
Returns a position in the text [line, column] given an index. Note: This does not modify anything in the parser. It only gives you the line and column where the caret would be given the index. The same applies as in update. Do not use compute with an index less than the current position of the caret.
- document[source]¶
The parsed document. This is a Document or FragmentedDocument created by the parse method.
- lexor_log[source]¶
The lexor_log document. See this document after each call to parse to see warnings and errors in the text that was parsed.
- load_node_parsers()[source]¶
Loads the node parsers. This function is called automatically when parse is called only if there was a change in the settings.
- msg(mod_name, code, pos, arg=None, uri=None)[source]¶
Provide the name of module issuing the message, the code number, the position of caret and optional arguments and uri. This information gets stored in the log.
- parse(text, uri=None)[source]¶
parses the given text. To see the results of this method see the document and log property. If no uri is given then document will return a DocumentFragment node.
- position[source]¶
Position of caret in the text in terms of line and column. i.e. returns [line, column]. You may use the attribute access pos if performance is an issue.
lexor.core.converter module¶
Converter Module
Provides the Converter object which defines the basic mechanism for converting the objects defined in lexor.core.elements. This involves using objects derived from the abstract class NodeConverter.
- class lexor.core.converter.BaseLog(converter)[source]¶
Bases: object
A simple class to provide messages to a converter. You must derive an object from this class in the module which will be issuing the messages. For instance:
- class Log(BaseLog):
- pass
After that you can create a new object and use it in a module.
log = Log(converter)where converter is a Converter provided to the module. Make sure that the module contains the objects MSG and MSG_EXPLANATION.
- class lexor.core.converter.Converter(fromlang='xml', tolang='xml', style='default', defaults=None)[source]¶
Bases: object
To see the languages available to the Converter see the lexor.lang module.
- document[source]¶
The parsed document. This is a Document or FragmentedDocument created by the convert method.
- exec_python(node, id_num, parser, error=True)[source]¶
Executes the contents of the processing instruction. You must provide an id number identifying the processing instruction, the namespace where the execution takes place and a parser that will parse the output provided by the execution. If error is True then any errors generated during the execution will be appended to the output of the document.
- lexor_log[source]¶
The lexorlog document. See this document after each call to convert to see warnings and errors.
- match_info(fromlang, tolang, style, defaults=None)[source]¶
Check to see if the converter main information matches.
- msg(mod_name, code, node, arg=None, uri=None)[source]¶
Provide the name of module issuing the message, the code number, the node with the error, optional arguments and uri. This information gets stored in the log.
- class lexor.core.converter.NodeConverter(converter)[source]¶
Bases: object
A node converter is an object which determines if the node will be copied (default). To avoid copying the node simply declare
copy = Falsewhen deriving a node converter. Note that by default, the children of the node (if any) will be copied and assigned to the parent. To avoid copying the children then set
copy_children = False- classmethod end(node)[source]¶
This method gets called after all the children have been copied. Make sure to return the node or the node replacement.
- classmethod start(node)[source]¶
This method gets called only if copy is set to True (default). By overloading this method you have access to the converter and the node. You can thus set extra variables in the converter or modify the node. DO NOT modify any of the parents of the node. If there is a need to modify any of parents of the node then set a variable in the converter to point to the node so that later on in the convert function it can be modified.
- lexor.core.converter.echo(node)[source]¶
Allows the insertion of Nodes generated within python embeddings.
<?python comment = PI(‘!–’, ‘This is a comment’) echo(comment) ?>
- lexor.core.converter.get_converter_namespace()[source]¶
Many converters may be defined during the conversion of a document. In some cases we may need to save references to objects in documents. If this is the case, then call this function to obtain the namespace where you can save those references.
- lexor.core.converter.get_current_node()[source]¶
Return the Document node containing the python embeddings currently being executed.
- lexor.core.converter.get_lexor_namespace()[source]¶
The execution of python instructions take place in the namespace provided by this function.
lexor.core.writer module¶
Writer Module
Provides the Writer object which defines the basic mechanism for writing the objects defined in lexor.core.elements. This involves using objects derived from the abstract class NodeWriter. See lexor.core.dev for more information on how to write objects derived from NodeWriter to be able to write Documents in the way you desire.
- class lexor.core.writer.DefaultWriter(writer)[source]¶
Bases: lexor.core.writer.NodeWriter
If the language does not define a NodeWriter for __default__ then the writer will use this default writer.
- class lexor.core.writer.NodeWriter(writer)[source]¶
Bases: object
A node writer is an object which writes a node in three steps: start, data/child, end.
- classmethod child(_)[source]¶
This method gets called for Elements that have children. If it gets overwritten then it will not traverse through child nodes unless you return something other than None.
This method by default returns True so that the Writer can traverse through the child nodes.
- data(node)[source]¶
This method gets called only by CharacterData nodes. This method should be overloaded to write their attribute data, otherwise it will write the node’s data as it is.
- end(node)[source]¶
Overload this method to write part of the Node object in the last encounter with the Node.
- class lexor.core.writer.Writer(lang='xml', style='default', defaults=None)[source]¶
Bases: object
To see the languages in which a Writer object is able to write see the lexor.lang module.
- endl(force=True, tot=1, tail=False)[source]¶
Insert a new line character. By setting force to False you may omit inserting a new line character if the last character printed was already the new line character.
- normalize_buffer()[source]¶
The term normalize means that the length of the buffer will be less than or equal to the wrapping width. Anything that exceeds the limit will be flushed.
- string_buffer[source]¶
The current string buffer. This is the string that will be printed after its length exceeds the writer’s width.
- write(node, filename=None, mode='w')[source]¶
Write node to a file or string. To write to a string use the default parameters, otherwise provide a file name. If filename is provided you have the option of specifying the mode: ‘w’ or ‘a’.
You may also provide a file you may have opened yourself in place of filename so that the writer writes to that file.
Use the __str__ function to retrieve the contents written to a string.
- lexor.core.writer.find_whitespace(line, start, lim)[source]¶
Attempts to find the index of the first whitespace before lim, if its not found, then it looks ahead.
lexor.core.selector module¶
Selector
This module is trying to simulate jquery selectors. If some code looks similar to that of the Sizzle CSS Selector engine it is because the ideas were taken from it.
In short, credit goes to [Sizzle][1] and CSS for the seletor idea.
[1]: http://sizzlejs.com/
- class lexor.core.selector.Selector(selector, node, results=None)[source]¶
Bases: object
JQuery like object.
- after(*arg, **keywords)[source]¶
Insert content, specified by the parameter, after each element in the set of matched elements.
: .after(content [,content])
:: content Type: htmlString or Element or Array or jQuery string, Node, array of Node, or Selector object to insert after each element in the set of matched elements.
:: content Type: htmlString or Element or Array or jQuery One or more additional DOM elements, arrays of elements, HTML strings, or jQuery objects to insert after each element in the set of matched elements.
: .after(function(node, index))
:: function(node, index) A function that returns a string, DOM element(s), or Selector object to insert after each element in the set of matched elements. Receives the element in the set and its index position in the set as its arguments.
: .after(..., lang=’html’, style=’default’, ‘defaults’=None)
:: lang The language in which strings will be parsed in.
:: style The style in which strings will be parsed in.
:: defaults A dictionary with string keywords and values especifying options for the particular style.
- append(*arg, **keywords)[source]¶
Insert content, specified by the parameter, to the end of each element in the set of matched elements.
Should behave similarly as https://api.jquery.com/append/. Major difference is in the function. When passing a function it should take 2 parameters: node, index. Where node will be the current element to which the return value will be appended to.
- before(*arg, **keywords)[source]¶
Insert content, specified by the parameter, before each element in the set of matched elements.
: .before(content [,content])
:: content Type: htmlString or Element or Array or jQuery string, Node, array of Node, or Selector object to insert before each element in the set of matched elements.
:: content Type: htmlString or Element or Array or jQuery One or more additional DOM elements, arrays of elements, HTML strings, or jQuery objects to insert before each element in the set of matched elements.
: .before(function(node, index))
:: function(node, index) A function that returns a string, DOM element(s), or Selector object to insert before each element in the set of matched elements. Receives the element in the set and its index position in the set as its arguments.
: .before(..., lang=’html’, style=’default’, ‘defaults’=None)
:: lang The language in which strings will be parsed in.
:: style The style in which strings will be parsed in.
:: defaults A dictionary with string keywords and values especifying options for the particular style.
- contents()[source]¶
Get the children of each element in the set of matched elements, including text and comment nodes.
- find(selector)[source]¶
Get the descendants of each element in the current set of matched elements, filtered by a selector.
- prepend(*arg, **keywords)[source]¶
Insert content, specified by the parameter, to the beginning of each element in the setof matched elements.
Should behave similarly as https://api.jquery.com/append/. Major difference is in the function. When passing a function it should take 2 parameters: node, index. Where node will be the current element to which the return value will be appended to.
- lexor.core.selector.clone_obj(obj, parser)[source]¶
Utility function to create deep copies of objects used for the Selector object. A parser should be given in case the object is a string.
- lexor.core.selector.select(selector, context, results, seed)[source]¶
A low-level selection function that works with Sizzle’s compiled selector functions
- @param {String|Function} selector A selector or a pre-compiled
- selector function built with Sizzle.compile
@param {Element} context @param {Array} [results] @param {Array} [seed] A set of elements to match against
- lexor.core.selector.sizzle(selector, context, results=None, seed=None)[source]¶
Function shamelessly borrowed and partially translated to python from http://sizzlejs.com/.