lexor.core package¶

The core of lexor is divided among the modules in this package.

node:	Provides the most basic structure to create the document object model (DOM).
elements:	Here we define the basic structures to handle the information provided in files. Make sure to familiarize yourself with all the objects in this module to be able to write extensions for the `Parser`, `Converter` and `Writer`.
parser:	The parser module provides the `Parser` and the abstract class `NodeParser` which helps us write derived objects for future languages to parse.
converter:	The converter module provides the `Converter` and the abstract class `NodeConverter` which helps us copy a `Document` we want to convert to another language.
writer:	The writer module provides the `Writer` and the abstract class `NodeWriter` which once subclassed help us tell the `Writer` how to write a `Node` to a file object.

lexor.core.parser module¶

Parser Module

Provides the Parser object which defines the basic mechanism for parsing character sequences. This involves using objects derived from the abstract class NodeParser.

class lexor.core.parser.NodeParser(parser)[source]¶

Bases: object

An object that has two methods: makeNode and close. The first method is required to be overloaded in derived objects.

close(_)[source]¶

This method needs to be overloaded if the node parser returns a Node with the make_node method.

This method will not get called if make_node returned a Node inside a list. The close function takes as input the Node object that make_node returned and it should decide if the node can be closed or not. If it is indeed time to close the Node then return a list with the position where the Node is being closed, otherwise return None.

If this method is not overloaded then a NotImplementedError exception will be raised.

make_node()[source]¶

This method is required to be overloaded by the derived node parser. It returns None if the node parser will not be able to create a node from the current information in the parser. Otherwise it creates a Node object and returns it.

When returning a node you have the option of informing the parser if the node is complete or not. For instance, if your node parser creates an Element and it does not have any children to be parsed then return a list containing only the single node. This will tell the parser that the node has been closed and it will not call the close method of the node parser. If the Node does not have a child, say ProcessingInstruction, RawText, or Void then there is no need to wrap the node in a list.

The Node object that this method returns also needs to have the property pos. This is a list of two integers stating the line and column number where the node was encountered in the text that is being parsed. This property will be removed by the parser once the parser finishes all processing with the node.

If this method is not overloaded as previously stated then a NotImplementedError exception will be raised.

msg(code, pos, arg=None, uri=None)[source]¶: Send a message to the parser.

class lexor.core.parser.Parser(lang='xml', style='default', defaults=None)[source]¶

Bases: object

To see the languages that it is able to parse see the lexor.lang module.

caret_position[source]¶: The index in the text the parser is processing. You may use the attribute access caret if performance is an issue.

cdata[source]¶: The character sequence data that was last processed by the parse method. You may use the attribute access text if performance is an issue.

compute(index)[source]¶: Returns a position in the text [line, column] given an index. Note: This does not modify anything in the parser. It only gives you the line and column where the caret would be given the index. The same applies as in update. Do not use compute with an index less than the current position of the caret.

copy_pos()[source]¶: Returns a copy of the current position.

document[source]¶: The parsed document. This is a Document or FragmentedDocument created by the parse method.

language[source]¶: The language in which the Parser object will parse character sequences.

lexor_log[source]¶: The lexor_log document. See this document after each call to parse to see warnings and errors in the text that was parsed.

load_node_parsers()[source]¶: Loads the node parsers. This function is called automatically when parse is called only if there was a change in the settings.

msg(mod_name, code, pos, arg=None, uri=None)[source]¶: Provide the name of module issuing the message, the code number, the position of caret and optional arguments and uri. This information gets stored in the log.

parse(text, uri=None)[source]¶: parses the given text. To see the results of this method see the document and log property. If no uri is given then document will return a DocumentFragment node.

parsing_style[source]¶: The style in which the Parser object will parse the character sequences.

position[source]¶: Position of caret in the text in terms of line and column. i.e. returns [line, column]. You may use the attribute access pos if performance is an issue.

set(lang, style, defaults=None)[source]¶: Set the language and style in one call.

update(index)[source]¶: Changes the position of the caret and updates pos. This function assumes that you are moving forward. Do not update to an index which is less than the current position of the caret.

uri[source]¶: The Uniform Resource Identifier. This is the name that was given to the text that was last parsed.

lexor.core.converter module¶

Converter Module

Provides the Converter object which defines the basic mechanism for converting the objects defined in lexor.core.elements. This involves using objects derived from the abstract class NodeConverter.

class lexor.core.converter.BaseLog(converter)[source]¶

Bases: object

A simple class to provide messages to a converter. You must derive an object from this class in the module which will be issuing the messages. For instance:

class Log(BaseLog):

pass

After that you can create a new object and use it in a module.

log = Log(converter)

where converter is a Converter provided to the module. Make sure that the module contains the objects MSG and MSG_EXPLANATION.

msg(code, arg=None, uri=None)[source]¶: Send a message to the converter.

class lexor.core.converter.Converter(fromlang='xml', tolang='xml', style='default', defaults=None)[source]¶

Bases: object

To see the languages available to the Converter see the lexor.lang module.

convert(doc, namespace=False)[source]¶: Convert the Document doc.

convert_from[source]¶: The language from which the converter will convert.

convert_to[source]¶: The language to which the converter will convert.

converting_style[source]¶: The converter style.

document[source]¶: The parsed document. This is a Document or FragmentedDocument created by the convert method.

exec_python(node, id_num, parser, error=True)[source]¶: Executes the contents of the processing instruction. You must provide an id number identifying the processing instruction, the namespace where the execution takes place and a parser that will parse the output provided by the execution. If error is True then any errors generated during the execution will be appended to the output of the document.

lexor_log[source]¶: The lexorlog document. See this document after each call to convert to see warnings and errors.

match_info(fromlang, tolang, style, defaults=None)[source]¶: Check to see if the converter main information matches.

msg(mod_name, code, node, arg=None, uri=None)[source]¶: Provide the name of module issuing the message, the code number, the node with the error, optional arguments and uri. This information gets stored in the log.

pop()[source]¶: Remove the last document and last log document and return them.

static remove_node(node)[source]¶: Removes the node from the current document it is in. Returns the previous sibling is possible, otherwise it returns an empty Text node.

set(fromlang, tolang, style, defaults=None)[source]¶: Sets the languages and styles in one call.

update_log(log, after=True)[source]¶: Append the messages from a log document to the converters log. Note that this removes the children from log.

class lexor.core.converter.NodeConverter(converter)[source]¶

Bases: object

A node converter is an object which determines if the node will be copied (default). To avoid copying the node simply declare

copy = False

when deriving a node converter. Note that by default, the children of the node (if any) will be copied and assigned to the parent. To avoid copying the children then set

copy_children = False

classmethod end(node)[source]¶: This method gets called after all the children have been copied. Make sure to return the node or the node replacement.

msg(code, node, arg=None, uri=None)[source]¶: Send a message to the converter.

classmethod start(node)[source]¶: This method gets called only if copy is set to True (default). By overloading this method you have access to the converter and the node. You can thus set extra variables in the converter or modify the node. DO NOT modify any of the parents of the node. If there is a need to modify any of parents of the node then set a variable in the converter to point to the node so that later on in the convert function it can be modified.

lexor.core.converter.echo(node)[source]¶: Allows the insertion of Nodes generated within python embeddings.

<?python comment = PI(‘!–’, ‘This is a comment’) echo(comment) ?>

lexor.core.converter.get_converter_namespace()[source]¶: Many converters may be defined during the conversion of a document. In some cases we may need to save references to objects in documents. If this is the case, then call this function to obtain the namespace where you can save those references.

lexor.core.converter.get_current_node()[source]¶: Return the Document node containing the python embeddings currently being executed.

lexor.core.converter.get_lexor_namespace()[source]¶: The execution of python instructions take place in the namespace provided by this function.

lexor.core.converter.import_module(mod_path, mod_name=None)[source]¶: Return a module from a path. If no name is provided then the name of the file loaded will be assigned to the name. When using relative paths, it will find the module relative to the file executing the python embedding.

lexor.core.converter.include(input_file, **keywords)[source]¶: Inserts a file into the current node.

lexor.core.writer module¶

Writer Module

Provides the Writer object which defines the basic mechanism for writing the objects defined in lexor.core.elements. This involves using objects derived from the abstract class NodeWriter. See lexor.core.dev for more information on how to write objects derived from NodeWriter to be able to write Documents in the way you desire.

class lexor.core.writer.DefaultWriter(writer)[source]¶

Bases: lexor.core.writer.NodeWriter

If the language does not define a NodeWriter for __default__ then the writer will use this default writer.

end(node)[source]¶: Write the end of the node as an xml end tag.

start(node)[source]¶: Write the start of the node as a xml tag.

class lexor.core.writer.NodeWriter(writer)[source]¶

Bases: object

A node writer is an object which writes a node in three steps: start, data/child, end.

classmethod child(_)[source]¶

This method gets called for Elements that have children. If it gets overwritten then it will not traverse through child nodes unless you return something other than None.

This method by default returns True so that the Writer can traverse through the child nodes.

data(node)[source]¶: This method gets called only by CharacterData nodes. This method should be overloaded to write their attribute data, otherwise it will write the node’s data as it is.

end(node)[source]¶: Overload this method to write part of the Node object in the last encounter with the Node.

start(node)[source]¶: Overload this method to write part of the Node object in the first encounter with the Node.

write(string, split=False)[source]¶: Writes the string to a file object. The file object is determined by the Writer object that initialized this object (self).

class lexor.core.writer.Writer(lang='xml', style='default', defaults=None)[source]¶

Bases: object

To see the languages in which a Writer object is able to write see the lexor.lang module.

close()[source]¶: Close the file.

disable_raw()[source]¶: Turn off raw mode.

disable_wrap()[source]¶: Turn off wrapping.

enable_raw()[source]¶: Use this to set the writing in raw mode.

enable_wrap()[source]¶: Use this to set the writing in wrapping mode.

endl(force=True, tot=1, tail=False)[source]¶: Insert a new line character. By setting force to False you may omit inserting a new line character if the last character printed was already the new line character.

filename[source]¶: READ-ONLY: The name of the file to which a Node object was last written to.

flush_buffer(tail=True)[source]¶: Empty the contents of the buffer.

get_node_writer(name)[source]¶: Return one of the NodeWriter objects available to the Writer.

indent[source]¶: The indentation at the beginning of each line.

language[source]¶: The language in which the Writer writes Node objects.

last()[source]¶: Returns the last written string with the contents of the buffer.

normalize_buffer()[source]¶: The term normalize means that the length of the buffer will be less than or equal to the wrapping width. Anything that exceeds the limit will be flushed.

raw_enabled()[source]¶: Determine if raw mode is enabled or not.

set(lang, style, defaults=None)[source]¶: Set the language and style in one call.

string_buffer[source]¶: The current string buffer. This is the string that will be printed after its length exceeds the writer’s width.

wrap_enabled()[source]¶: Determine if wrap mode is enabled or not.

write(node, filename=None, mode='w')[source]¶

Write node to a file or string. To write to a string use the default parameters, otherwise provide a file name. If filename is provided you have the option of specifying the mode: ‘w’ or ‘a’.

You may also provide a file you may have opened yourself in place of filename so that the writer writes to that file.

Use the __str__ function to retrieve the contents written to a string.

write_str(string, split=False)[source]¶: The write function is meant to be used with Node objects. Use this function to write simple strings while the file descriptor is open.

writing_style[source]¶: The style in which the Writer writes a Node object.

lexor.core.writer.find_whitespace(line, start, lim)[source]¶: Attempts to find the index of the first whitespace before lim, if its not found, then it looks ahead.

lexor.core.writer.replace(string, *key_val)[source]¶

Replacement of strings done in one pass. Example:

>>> replace("a < b && b < c", ('<', '&lt;'), ('&', '&amp;'))
'a &lt; b &amp;&amp; b &lt; c'

Source: <http://stackoverflow.com/a/15221068/788553>

lexor.core.selector module¶

Selector

This module is trying to simulate jquery selectors. If some code looks similar to that of the Sizzle CSS Selector engine it is because the ideas were taken from it.

In short, credit goes to [Sizzle][1] and CSS for the seletor idea.

[1]: http://sizzlejs.com/

class lexor.core.selector.Selector(selector, node, results=None)[source]¶

Bases: object

JQuery like object.

after(*arg, **keywords)[source]¶

Insert content, specified by the parameter, after each element in the set of matched elements.

: .after(content [,content])

:: content Type: htmlString or Element or Array or jQuery string, Node, array of Node, or Selector object to insert after each element in the set of matched elements.

:: content Type: htmlString or Element or Array or jQuery One or more additional DOM elements, arrays of elements, HTML strings, or jQuery objects to insert after each element in the set of matched elements.

: .after(function(node, index))

:: function(node, index) A function that returns a string, DOM element(s), or Selector object to insert after each element in the set of matched elements. Receives the element in the set and its index position in the set as its arguments.

: .after(..., lang=’html’, style=’default’, ‘defaults’=None)

:: lang The language in which strings will be parsed in.

:: style The style in which strings will be parsed in.

:: defaults A dictionary with string keywords and values especifying options for the particular style.

append(*arg, **keywords)[source]¶

Insert content, specified by the parameter, to the end of each element in the set of matched elements.

Should behave similarly as https://api.jquery.com/append/. Major difference is in the function. When passing a function it should take 2 parameters: node, index. Where node will be the current element to which the return value will be appended to.

before(*arg, **keywords)[source]¶

Insert content, specified by the parameter, before each element in the set of matched elements.

: .before(content [,content])

:: content Type: htmlString or Element or Array or jQuery string, Node, array of Node, or Selector object to insert before each element in the set of matched elements.

:: content Type: htmlString or Element or Array or jQuery One or more additional DOM elements, arrays of elements, HTML strings, or jQuery objects to insert before each element in the set of matched elements.

: .before(function(node, index))

:: function(node, index) A function that returns a string, DOM element(s), or Selector object to insert before each element in the set of matched elements. Receives the element in the set and its index position in the set as its arguments.

: .before(..., lang=’html’, style=’default’, ‘defaults’=None)

:: lang The language in which strings will be parsed in.

:: style The style in which strings will be parsed in.

:: defaults A dictionary with string keywords and values especifying options for the particular style.

contents()[source]¶: Get the children of each element in the set of matched elements, including text and comment nodes.

find(selector)[source]¶: Get the descendants of each element in the current set of matched elements, filtered by a selector.

prepend(*arg, **keywords)[source]¶

Insert content, specified by the parameter, to the beginning of each element in the setof matched elements.

Should behave similarly as https://api.jquery.com/append/. Major difference is in the function. When passing a function it should take 2 parameters: node, index. Where node will be the current element to which the return value will be appended to.

lexor.core.selector.clone_obj(obj, parser)[source]¶: Utility function to create deep copies of objects used for the Selector object. A parser should be given in case the object is a string.

lexor.core.selector.get_date()[source]¶: Obtain an integer representation of the date.

lexor.core.selector.mark_function(fnc)[source]¶: Mark a function for special use by Sizzle.

lexor.core.selector.select(selector, context, results, seed)[source]¶

A low-level selection function that works with Sizzle’s compiled selector functions

@param {String|Function} selector A selector or a pre-compiled: selector function built with Sizzle.compile

@param {Element} context @param {Array} [results] @param {Array} [seed] A set of elements to match against

lexor.core.selector.sizzle(selector, context, results=None, seed=None)[source]¶: Function shamelessly borrowed and partially translated to python from http://sizzlejs.com/.

lexor.core.selector.tokenize(selector, parse_only=False)[source]¶: Tokenize...