Avatar

Prologue

In this article, you will be provided a thorough treatise on an in-house developed tool for parsing and validating CVRF documents aptly named “cvrfparse”. The article is split into two parts. The first part, intended for CVRF document producers and consumers, is a hands-on manual detailing how to use cvrfparse. The second part, intended for burgeoning Python programmers, explores some of the inner workings of the tool.

Introduction

The CVRF parser or “cvrfparse” is a Python-based command line tool that offers simple parsing and validation of CVRF documents. Using it, you can quickly query a CVRF document for any of its contents. For example, let’s say one of your vendors releases a bundle of security advisories encoded in CVRF. There are a dozen individual CVRF documents each with multiple vulnerabilities across hundreds of products. Using cvrfparse, you can quickly ascertain which documents contain vulnerable products you might have installed in your infrastructure. We’ll see how, shortly.

Cvrfparse is a validating parser. Before you start looking for data in a CVRF document, you might want to quickly check to ensure a CVRF document is well-formed and/or valid (in fact you’ll need a well-formed and valid document before you can parse it). This is useful for document producers who provide CVRF content to their customers.

Without further ado, let’s get to it and check out the tool. You can download the tool as a Python package from The Python Package Index (PyPI) or check out the source at GitHub. The only third-party code you may need to install is the lxml library. The easiest way to install cvrfparse and all required dependencies is to use pip. A typical invocation would be:

[sb:~] mike% pip install cvrfparse

The sample CVRF document used in the examples below is included in the distribution of the tool.

Need a CVRF Refresher?

If you’re working with CVRF at any level, the two-part CVRF Missing Manual blog series is highly recommended. In fact, savvy readers will notice that the sample CVRF document included with cvrfparse is the same one created for that blog series.

The Tool: Cvrfparse

Before we dive into some examples, let’s first explore all of the options we can specify when using the tool, to do that, we invoke cvrfparse with the “help” switch:

[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --help
usage: cvrfparse.py [-h] -f FILE
                    [--cvrf [{all,DocumentTitle,DocumentType,DocumentPublisher,DocumentTracking,DocumentNotes,DocumentDistribution,AggregateSeverity,DocumentReferences,Acknowledgments} ...]]
                    [--vuln [{all,Title,ID,Notes,DiscoveryDate,ReleaseDate,Involvements,CVE,CWE,ProductStatuses,Threats,CVSSScoreSets,Remediations,References,Acknowledgments} ...]]
                    [--prod [{all,Branch,FullProductName,Relationship,ProductGroups} ...]]
                    [-c] [-s] [-V] [-S SCHEMA] [-C CATALOG] [-v]

Validate/parse a CVRF 1.1 document and emit user-specified bits.

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  candidate CVRF 1.1 XML file
  --cvrf [{all,DocumentTitle,DocumentType,DocumentPublisher,DocumentTracking,DocumentNotes,DocumentDistribution,AggregateSeverity,DocumentReferences,Acknowledgments} ...]
                        emit CVRF elements, use "all" to glob all CVRF
                        elements.
  --vuln [{all,Title,ID,Notes,DiscoveryDate,ReleaseDate,Involvements,CVE,CWE,ProductStatuses,Threats,CVSSScoreSets,Remediations,References,Acknowledgments} ...]
                        emit Vulnerability elements, use "all" to glob all
                        Vulnerability elements.
  --prod [{all,Branch,FullProductName,Relationship,ProductGroups} ...]
                        emit ProductTree elements, use "all" to glob all
                        ProductTree elements.
  -c, --collate         collate all of the Vulnerability elements by ordinal
                        into separate files
  -s, --strip-ns        strip namespace header from element tags before
                        printing
  -V, --validate        validate the CVRF document
  -S SCHEMA, --schema SCHEMA
                        specify local alternative for cvrf.xsd
  -C CATALOG, --catalog CATALOG
                        specify location for catalog.xml (default is
                        ./cvrfparse/schemata/catalog.xml)
  -v, --version         show program's version number and exit

While the help display might seem to offer a lot of confusing options, if you know CVRF, it’s really quite simple. The following list explains each command line option in detail:

  • file: The first argument is the CVRF document you intend to explore. This is the only mandatory argument, but if you don’t specify any other options, the tool has nothing to do, and well that’s just boring. Let’s see what else we can do.
  • cvrf/vuln/prod: These should be familiar to you. These refer to the XML namespaces for each corresponding schema. Using one or more of these specifier options will allow you to particularize any elements you’re interested in parsing from a CVRF file. Using the keyword of all will tell cvrfparse to emit all of the elements in that namespace.
  • collate: This option informs cvrfparse you are interested in collating all of the vulnerability containers into separate files, identified by Vulnerability Ordinal.
  • strip-ns: If you’re tired of seeing the namespace prefix before every emitted element, you can specify this option to remove it.
  • validate: This option will attempt to fetch the CVRF 1.1 schema files and validate your CVRF document.
  • schema: This option enables the user to specify a local alternative for the schema files (all of which are included with the distribution of the tool). We’ll learn more about this guy below.
  • catalog: Finally, this option enables the user to specify the location of the catalog.xml file. Like the schema option, we’ll learn more about this guy below.

Cvrfparse Command-line Examples: Remote Validation

Now that we have the options down, let’s explore a simple standard invocation: validating a CVRF document against the remote schema files. This may sound intimidating, but it’s actually the easiest (and default) way to ensure you have a valid and well-formed CVRF document to work with. Here’s how to do it:

[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --validate
Fetching schemata...
Valid

Ok, that was easy. Now that we know what it looks like when we work with a valid and well-formed CVRF document, let’s muck with it a bit and see an example of when validation fails:

[sb:~/cvrfparse] mike% sed 's/<InitialReleaseDate>2011-05-25T00:00:00+00:00/<InitialReleaseDate>TODAY/' cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml > cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-invalid.xml [sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-invalid.xml --validate
Fetching schemata...
cvrfparse/CVRF-1.1-cisco-sa-20110525-rvs4000-invalid.xml:31:0:ERROR:SCHEMASV:SCHEMAV_CVC_DATATYPE_VALID_1_2_1: Element '{http://www.icasi.org/CVRF/schema/cvrf/1.1}InitialReleaseDate': 'TODAY' is not a valid value of the atomic type 'xs:dateTime'.

Ah. That’s nifty. cvrfparse not only told us the document was invalid, but also exactly where and how it was invalid.
For our next example, let’s see what happens when the document is not well-formed:

[sb:~/cvrfparse] mike% sed 's/InitialReleaseDate/InitialReleaseDatefoo/' cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml > cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-notwellformed.xml [sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-notwellformed.xml --validate
cvrfparse.py: Parsing error, document "cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-notwellformed.xml" is not well-formed: cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000-notwellformed.xml:31:96:FATAL:PARSER:ERR_TAG_NAME_MISMATCH: Opening and ending tag mismatch: InitialReleaseDatefoo line 31 and InitialReleaseDate

Again cvrfparse found the error and told us exactly where and what it is. Oh cvrfparse, what can’t you do!

Cvrfparse Command-line Examples: Local Validation

Normally, when –validate is specified, cvrfparse fetches the remote schemata from all over the Internet. While this is the simplest way to invoke the validation logic, it’s also the slowest and can take several seconds to complete. For a single document, this is probably acceptable, but if you’re doing bulk validation and running cvrfparse from a script or in a pipeline, there is a faster way. You can force cvrfparse to use local copies of the various schema files required to validate, resulting in a dramatic performance increase (on my home machine and 20Mbps cable modem I saw a 50x speed increase). To facilitate local validation, cvrfparse ships with copies of all of the required schema files and a catalog file that point to them. To invoke local validation, we use the –schema option to point to the CVRF 1.1 schema file and the –catalog option to point to the local catalog.xml (the –catalog option can be omitted if the catalog.xml is in the default directory of ./cvrfparse/schemata/catalog.xml).

[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --validate --schema cvrfparse/schemata/cvrf/1.1/cvrf.xsd --catalog cvrfparse/schemata/catalog.xml
Valid

Cvrfparse Command-line Examples: Element Emission

Once we’re sure we have a well-formed and valid CVRF document, we can start emitting some elements. A common use-case would be to query a document for the Document Title and Document Type:

[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --cvrf DocumentTitle DocumentType
[{http://www.icasi.org/CVRF/schema/cvrf/1.1}DocumentTitle] Cisco Security Advisory: Cisco RVS4000 and WRVS4400N Web Management Interface Vulnerabilities
[{http://www.icasi.org/CVRF/schema/cvrf/1.1}DocumentType] Security Advisory

Sweet. Now, if you don’t want to see that pesky namespace header preceding every line of output, use the –strip-ns option:

[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --cvrf DocumentTitle DocumentType --strip-ns
[DocumentTitle] Cisco Security Advisory: Cisco RVS4000 and WRVS4400N Web Management Interface Vulnerabilities
[DocumentType] Security Advisory

Ah, much better. Another useful example is to emit the Product Tree Full Product Name elements with their corresponding Product ID attributes:

[sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --prod FullProductName --strip-ns
[FullProductName] Cisco RVS4000 Gigabit Security Router version 1
(ProductID: CVRF1.1-PID-0001)
[FullProductName] Cisco RVS4000 Gigabit Security Router version 2
(ProductID: CVRF1.1-PID-0002)
[FullProductName] Cisco RVS4000 Gigabit Security Router version 1.3.3.5
(ProductID: CVRF1.1-PID-0006)
[FullProductName] Cisco RVS4000 Gigabit Security Router version 2.0.2.7
(ProductID: CVRF1.1-PID-0007)
[FullProductName] Cisco WRVS4400N Wireless-N Gigabit Security Router version 1.0
(ProductID: CVRF1.1-PID-0003)
[FullProductName] Cisco WRVS4400N Wireless-N Gigabit Security Router version 1.1
(ProductID: CVRF1.1-PID-0004)
[FullProductName] Cisco WRVS4400N Wireless-N Gigabit Security Router version 2.0
(ProductID: CVRF1.1-PID-0005)
[FullProductName] Cisco WRVS4400N Wireless-N Gigabit Security Router version 2.0.2.1
(ProductID: CVRF1.1-PID-0008)

Want to quickly check to see if there are any high priority CVSS Scores? We can pull out the CVSS Score Sets from each vulnerability:

[sjc-vpn6-826:~/PycharmProjects/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --strip-ns --vuln CVSSScoreSets | grep BaseScore | sort -r -k2
[BaseScore] 9.3
[BaseScore] 9.0
[BaseScore] 5.0

Cvrfparse Command-line Examples: Vulnerability Container Collation

As we learned above, cvrfparse also contains functionality to be able to collate each vulnerability in a document by Vulnerability Ordinal.

 [sb:~/cvrfparse] mike% ./cvrfparse/cvrfparse.py --file cvrfparse/sample-xml/CVRF-1.1-cisco-sa-20110525-rvs4000.xml --strip-ns --collate [sb:~/cvrfparse] mike% ls -sh cvrfparse*txt
24 cvrfparse-Cisco_Security_Advisory:_Cisco_RVS4000_and_WRVS4400N_Web_Management_Interface_Vulnerabilities-ordinal-1.txt
24 cvrfparse-Cisco_Security_Advisory:_Cisco_RVS4000_and_WRVS4400N_Web_Management_Interface_Vulnerabilities-ordinal-2.txt
24 cvrfparse-Cisco_Security_Advisory:_Cisco_RVS4000_and_WRVS4400N_Web_Management_Interface_Vulnerabilities-ordinal-3.txt
[sb:~/cvrfparse] mike% head cvrfparse-Cisco_Security_Advisory:_Cisco_RVS4000_and_WRVS4400N_Web_Management_Interface_Vulnerabilities-ordinal-1.txt
[Vulnerability] 
(Ordinal: 1)

[Title] Retrieval of the configuration file
[Notes] 
[Note] The Cisco RVS4000 and WRVS4400N Gigabit Security Routers deliver high-speed network 
            access and IPsec VPN capabilities for small businesses. They also provides firewall 
            and intrusion prevention capabilities.
            The Cisco RVS4000 and WRVS4400N Gigabit Security Routers contains a web management 
            interface vulnerability:

Nicely done. If we had invoked cvrfparse as above on a CVRF document that had no Vulnerability Containers (which is perfectly valid), the program will quietly and correctly do nothing.

Under the Hood

As I’ve done in the past, in all of my technical blogs where I release code, I like to choose some linchpin code block and discuss it. With cvrfparse, we’ll have a look at a few interesting sections. We’ll check out the three functions that perform most of the work: validation, parsing and vulnerability collation.

Validation

The validation function accepts two arguments: a file object which will contain the un-parsed schema document and a lxml parsed (and consequently well-formed) CVRF document. The function first attempts to parse the schema into an ElementTree object. Provided the document is well-formed (what a disaster if your schema was broken!) control will proceed to the next line; this line calls XMLSchema which turns the document into an XML Schema validator. This object has the assertValid method that allows us to get an exception while validating. To find out why validation failed, we can check the error_log object. Assuming all goes well, the assertion will not fail and the function will return True and the string “Valid”.

def cvrf_validate(f, cvrf_doc): """ Validates a CVRF document f: file object containing the schema cvrf_doc: the serialized CVRF ElementTree object returns: a tuple containing the return code (True for valid / False for invalid) and a reason for the code """
    try:
        xmlschema_doc = etree.parse(f)
    except etree.XMLSyntaxError as e:
        log = e.error_log.filter_from_level(etree.ErrorLevels.FATAL)
        return False, 'Parsing error, schema document "{0}" is not well-formed: {1}'.format(f.name, log)
    xmlschema = etree.XMLSchema(xmlschema_doc)

    try:
        xmlschema.assertValid(cvrf_doc)
        return True, "Valid"
    except etree.DocumentInvalid:
        return False, xmlschema.error_log

Parsing

The parsing function is even simpler. It also accepts two arguments: the parsed CVRF document and the elements the user wishes to emit, encoded as a list. It returns a dictionary that contains the filename of where to write the contents and a list that contains the items to write. The function kicks off by declaring an empty list that we’ll use to store the items the user wants to emit. The function makes liberal use of Python’s versatile workhorse iteration construct, the for loop. The top-level for loop iterates over each item in parsables and extracts each element in the list. For each one of the elements in parsables, we use the lxml/etree iter() method as an iterator to filter each element extract and each ElementTree node. Finally, we then iterate over each node in that child and add everything we find to the items list. When we’ve exhausted all of the items in parsables, we return a dictionary that contains the file to write the output to, which is currently standard output, and the list of the items to write.

def cvrf_parse(cvrf_doc, parsables): """ Parse a cvrf_doc and return a list of elements as determined by parsables cvrf_doc: the serialized CVRF ElementTree object parsables: list of elements to parse from a CVRF doc returns: a dictionary of the format {filename:[item, ...]} """
    items = []
    for element in parsables:
        for node in cvrf_doc.iter(element):
            for child in node.iter():
                items.append(child)
    # Hardcoded output for now, eventually make this user-tunable
    return {"stdout": items}

Vulnerability Collation

As our denouement, let’s have a look at the vulnerability collation function, cvrf_collate_vuln(). It accepts only a single familiar argument, the parsed CVRF document and returns a dictionary of exactly the same format as does cvrf_parse(). The function starts by declaring an empty dictionary which will hold the results. Next on its todo list is the creation of a root filename in which the collation process will store the goods. We use the findtext() method which is part of ElementTree’s Xpath-like query language, ElementPath, to find the first (and only, assuming the document is valid) DocumentTitle element and return its contents. If you look closely, you’ll notice the rather long line of string methods is actually operating on two different strings. The first one removes the curly braces from the namespace specifier string to accommodate the format required by findtext(). The second preps the filename by removing all extraneous whitespace from the Document Title and replacing any “internal” spaces with underscores.

Next, the iterator uses the findall() method which issues an Xpath like query to return all match elements. In this case, we want to iterate over each Vulnerability element. We create the specific filename, which is prefixed by the string literal “cvrfparse-“, followed by the title we just created, followed by the string literal “-ordinal-“, followed by the vulnerability’s ordinal, and capped with the string literal “.txt”. The function then uses the iter() method we saw above to create a list comprehension and store the whole in the dictionary indexed by the filename.

def cvrf_collate_vuln(cvrf_doc): """ Zip through a cvrf_doc and return all vulnerability elements collated by ordinal cvrf_doc: the serialized CVRF ElementTree object returns: a dictionary of the format {filename:[item, ...], filename:[item, ...]} """
    results = {}
    # Obtain document title to use in the filename(s) tiptoeing around around the curly braces in our NS definition
    document_title = cvrf_doc.findtext("cvrf:DocumentTitle",
                                       namespaces={"cvrf": CVRF_Syntax.NAMESPACES["CVRF"].replace("{", "").replace("}", "")}).strip().replace(" ", "_")

    # Constrain Xpath search to the Vulnerability container
    for node in cvrf_doc.findall('.//' + CVRF_Syntax.NAMESPACES['VULN'] + 'Vulnerability'):
        # Create filename based on ordinal number to use as a key for results dictionary
        filename = 'cvrfparse-' + document_title + '-ordinal-' + node.attrib['Ordinal'] + '.txt'
        # Create an iterator to iterate over each child element and populate results dictionary values
        results[filename] = node.iter()

    return results

Conclusion

We looked at the newly open sourced tool, cvrfparse, a validating parser for CVRF. It’s up for grabs at PyPI and GitHub! As work continues on the tool, your comments, critiques, and pull requests are welcomed.