XML File Parser¶
The XMLFile class provides a local file-based interface for reading and writing CIM power system models in XML/RDF format. This is particularly useful for working with small to medium-sized test cases without requiring a database infrastructure.
Overview¶
The XML file parser implements the ConnectionInterface and provides the following capabilities:
- Read CIM XML/RDF files conforming to IEC 61970-301 (CIM RDF Schema)
- Parse and validate namespace declarations from XML headers
- Build typed property graphs from XML element trees
- Write CIM objects back to XML/RDF format
- Support for both rdf:ID and rdf:about serialization formats
- Automatic namespace extraction and validation
User Guide¶
Installation and Setup¶
The XML parser requires no additional database software. It uses the Python defusedxml library for secure XML parsing.
First, set the CIM profile environment variable and import the profile:
import os
os.environ['CIMG_CIM_PROFILE'] = 'cimhub_2023'
import cimgraph.data_profile.cimhub_2023 as cim
Creating an XMLFile Connection¶
Import and instantiate the XMLFile class:
from cimgraph.databases import XMLFile
file = XMLFile(filename='../../sample_models/ieee13.xml')
File ../../sample_models/ieee13.xml not found. Defaulting to empty network graph
Constructor Parameters¶
The XMLFile.__init__() method accepts the following parameters:
Required Parameters:
filename(str | list[str]): Path to the XML file(s) to read. Can be a single file path or a list of paths for multi-file models.
Optional Parameters:
namespaces(dict): Additional namespace mappings to override or supplement extracted namespaces. Format:{'prefix': 'uri'}. Default:None
Behavior:
- Automatically clears cached environment variables to ensure fresh configuration
- Retrieves CIM profile, namespace, IEC 61970-301 version, and validation log level from environment
- Calls
connect()method automatically to parse the XML file and extract namespaces
Loading a FeederModel from XML¶
Once the XMLFile connection is established, create a FeederModel to build the network graph:
from cimgraph.models import FeederModel
network = FeederModel(container=cim.Feeder(), connection=file)
No root element found in XML file
The complete network model, including both forward and reverse associations, is now loaded into the graph.
To demonstrate the structure, let's examine a breaker object:
from cimgraph import utils
from mermaid import Mermaid
# Get first breaker in graph
breaker = network.first(cim.Breaker)
# Display the breaker
Mermaid(utils.get_mermaid(breaker))
User API Methods¶
The XMLFile class implements the ConnectionInterface abstract base class and provides the following user-facing methods:
connect()¶
Establishes connection to the XML file and parses the document structure.
Parameters: None
Returns: None
Behavior:
- Parses XML file using
defusedxml.ElementTree.parse()for security - Extracts namespace declarations from the XML root element
- Updates internal namespace mappings
- Initializes empty graph structure (
defaultdict(lambda: defaultdict(dict))) - Creates empty
class_indexfor tracking object types by URI - If file is not found, logs warning and creates empty graph
Usage:
file = XMLFile(filename='model.xml') # connect() called automatically
disconnect()¶
Releases resources held by the XML file connection.
Parameters: None
Returns: None
Behavior:
- Deletes the ElementTree (
self.tree) - Deletes the root element (
self.root) - Deletes the graph dictionary (
self.graph)
Usage:
file.disconnect()
get_object(mRID, graph=None)¶
Retrieves a single CIM object from the XML file by its mRID.
Parameters:
mRID(str): The master resource identifier (UUID) of the object to retrievegraph(dict): Optional existing graph (not used in current implementation)
Returns:
object: The parsed CIM object, orNoneif not found
Behavior:
- Iterates through all XML elements in the document
- Matches elements by
rdf:aboutorrdf:IDattributes - Returns the first object whose URI contains the specified mRID
- Calls
parse_nodes()to construct the object
Usage:
breaker = file.get_object(mRID='4c04f838-62aa-475e-aefa-a63b7c889c13')
get_from_triple(subject, predicate, graph=None)¶
Retrieves attribute values for a specific object and predicate.
Parameters:
subject(Identity): The CIM object to querypredicate(str): The attribute/association name to retrievegraph(Graph): Optional existing graph (uses empty graph if not provided)
Returns:
list[object]: List of values for the specified attribute
Behavior:
- Finds all XML elements matching the subject's class type
- Filters to elements matching the subject's URI
- Extracts child elements matching the predicate
- Parses and returns the values
Usage:
breaker = network.first(cim.Breaker)
terminals = file.get_from_triple(breaker, 'Terminals')
create_new_graph(container, graph=None)¶
Builds the complete typed property graph from the XML file.
Parameters:
container(object): The top-level CIM container object (typically aFeeder)graph(dict): Optional existing graph to populate (creates new ifNone)
Returns:
Graph: The populated typed property graph
Behavior:
- Two-pass parsing approach:
- First pass: Create all nodes (objects) using
parse_nodes() - Second pass: Create all edges (associations) using
parse_edges()
- First pass: Create all nodes (objects) using
- Validates all elements against the loaded CIM profile
- Builds bidirectional associations between objects
- Returns empty graph if XML root is
None
Usage:
graph = file.create_new_graph(container=cim.Feeder())
Note: This method is typically called internally by FeederModel or other GraphModel subclasses.
upload(graph)¶
Writes a typed property graph back to XML/RDF format.
Parameters:
graph(Graph): The typed property graph to serialize
Returns: None
Behavior:
- Creates properly formatted CIM XML/RDF file
- Handles both IEC 61970-301 v7 and v8+ formats:
- v7: Uses
rdf:IDand#for resources - v8+: Uses
rdf:aboutwithurn:uuid:URIs
- v7: Uses
- Serializes all object attributes and associations
- Handles enumerations, primitives, and object references
- Supports many-to-one and known many-to-many associations
- Preserves CIM units with
rdf:datatypeattributes - Writes to the file specified in
self.filename
Usage:
# Modify objects in the graph
# ...
# Write back to XML
file.upload(network.graph)
execute(query_message)¶
Not implemented for XML file parser.
Parameters:
query_message(str): Query string (not used)
Returns:
QueryResponse: Empty response
Note: The XML parser uses direct element tree traversal instead of query-based access.
create_distributed_graph(area, graph=None)¶
Not supported for XML file parser.
Parameters:
area(object): Geographic area object (not used)graph(dict): Optional existing graph
Returns:
Graph: Empty graph
Behavior:
- Logs error message: "distributed models not supported for XML file read"
- Returns empty graph structure
Note: Distributed model functionality is only available with database backends.
UML Sequence Diagrams¶
This section provides sequence diagrams showing the internal workflow of key XMLFile methods.
XMLFile Initialization and Connection¶
from mermaid import Mermaid
diagram_text = """%%{init: {"theme":"base"}}%%
sequenceDiagram
actor User
participant XMLFile
participant ElementTree
participant FileSystem
note right of User: Initialize XML connection
User ->>+ XMLFile: XMLFile(filename, namespaces)
XMLFile ->> XMLFile: clear env variable cache
XMLFile ->> XMLFile: retrieve CIM profile & namespace
XMLFile ->>+ XMLFile: connect()
XMLFile ->>+ ElementTree: parse(filename)
ElementTree ->>+ FileSystem: read XML file
FileSystem -->>- ElementTree: file contents
ElementTree -->>- XMLFile: tree & root element
XMLFile ->>+ FileSystem: open & read header (8KB)
FileSystem -->>- XMLFile: XML header text
XMLFile ->> XMLFile: extract_namespaces_from_header()
XMLFile ->> XMLFile: update namespace mappings
XMLFile ->> XMLFile: initialize empty graph
XMLFile ->> XMLFile: initialize class_index
XMLFile -->>- XMLFile: connection ready
XMLFile -->>- User: XMLFile instance
"""
Mermaid(diagram_text)
create_new_graph() - Building the Typed Property Graph¶
diagram_text = """%%{init: {"theme":"base"}}%%
sequenceDiagram
actor GraphModel
participant XMLFile
participant root element
note right of GraphModel: Build network graph from XML
GraphModel ->>+ XMLFile: create_new_graph(container, graph)
note over XMLFile,root element: First Pass - Create Nodes
loop for each element in root
XMLFile ->>+ XMLFile: parse_nodes(element)
XMLFile ->> XMLFile: extract class name from tag
XMLFile ->> XMLFile: extract rdf:about or rdf:ID
XMLFile ->> XMLFile: create_object(graph, cim_class, uri)
XMLFile ->> XMLFile: update class_index
XMLFile -->>- XMLFile: object created
end
note over XMLFile,root element: Second Pass - Create Edges
loop for each element in root
XMLFile ->>+ XMLFile: parse_edges(element)
XMLFile ->> XMLFile: extract object from graph
loop for each sub_element
XMLFile ->> XMLFile: parse_value(sub_element)
alt rdf:resource present
XMLFile ->> XMLFile: create_edge() to linked object
XMLFile ->> XMLFile: create reverse edge
else text value present
XMLFile ->> XMLFile: create_value() attribute
else enumeration
XMLFile ->> XMLFile: set enum value
end
end
XMLFile -->>- XMLFile: edges created
end
XMLFile -->>- GraphModel: Graph
"""
Mermaid(diagram_text)
upload() - Writing Graph to XML/RDF¶
diagram_text = """%%{init: {"theme":"base"}}%%
sequenceDiagram
actor User
participant GraphModel
participant XMLFile
participant FileSystem
note right of User: Write modified graph to XML
User ->>+ GraphModel: upload()
GraphModel ->>+ XMLFile: upload(graph)
XMLFile ->>+ FileSystem: open(filename, 'w')
FileSystem -->>- XMLFile: file handle
XMLFile ->> XMLFile: determine IEC 61970-301 format
XMLFile ->>+ FileSystem: write XML header & RDF tag
FileSystem -->>- XMLFile: ok
loop for each class in graph
loop for each object in class
XMLFile ->>+ FileSystem: write opening tag with mRID
FileSystem -->>- XMLFile: ok
loop for each parent class
loop for each attribute
XMLFile ->> XMLFile: get attribute value
alt CIM object reference
XMLFile ->>+ FileSystem: write rdf:resource edge
FileSystem -->>- XMLFile: ok
else enumeration
XMLFile ->>+ FileSystem: write rdf:resource enum
FileSystem -->>- XMLFile: ok
else primitive with CIMUnit
XMLFile ->>+ FileSystem: write with rdf:datatype
FileSystem -->>- XMLFile: ok
else primitive value
XMLFile ->>+ FileSystem: write text value
FileSystem -->>- XMLFile: ok
end
end
end
XMLFile ->>+ FileSystem: write closing tag
FileSystem -->>- XMLFile: ok
end
end
XMLFile ->>+ FileSystem: write closing RDF tag
FileSystem -->>- XMLFile: ok
XMLFile ->>+ FileSystem: close file
FileSystem -->>- XMLFile: ok
XMLFile -->>- GraphModel: None
GraphModel -->>- User: upload complete
"""
Mermaid(diagram_text)
Developer Documentation¶
This section documents the internal methods used by the XML parser implementation. These methods are not typically called directly by users but are essential for understanding the parser's operation.
extract_namespaces_from_header()¶
Extracts namespace declarations from the XML root element.
Parameters: None
Returns:
dict: Dictionary mapping namespace prefixes to URIs (without curly braces)
Implementation Details:
- Reads the first 8KB of the XML file to find the root element
- Uses regex pattern
xmlns(?::([a-zA-Z0-9_-]+))?=["']([^"']+)["']to match namespace declarations - Captures both prefixed namespaces (
xmlns:cim="...") and default namespace (xmlns="...") - Stores namespaces WITHOUT curly braces for ElementTree compatibility
- Default namespace stored with key
'default' - Logs debug information for each namespace found
Source: cimgraph/databases/fileparsers/xml_parser.py:77
parse_nodes(element)¶
Creates CIM objects from XML elements without populating associations.
Parameters:
element(xml.etree.ElementTree.Element): The XML element to parse
Returns:
Identity: The created CIM object, orNoneif parsing fails
Implementation Details:
- Extracts class name from element tag by removing namespace URI
- Tries primary namespace first, then falls back to other registered namespaces
- Validates class name against CIM profile (
self.cim.__all__) - Extracts object identifier from
rdf:aboutorrdf:IDattribute - Strips URI prefixes (e.g.,
urn:uuid:) to extract UUID - Calls
create_object()to instantiate the object and add to graph - Updates
class_indexdictionary for later edge creation - Logs validation warnings for classes not in the data profile
Source: cimgraph/databases/fileparsers/xml_parser.py:192
parse_edges(element)¶
Populates associations (edges) between CIM objects after all nodes are created.
Parameters:
element(xml.etree.ElementTree.Element): The XML element to parse
Returns: None
Implementation Details:
- Extracts class name and object identifier from element
- Retrieves the object from the graph using class type and UUID
- Iterates through all child elements (sub-elements)
- Skips
Identity.identifierelements - Calls
parse_value()for each child element to create edges/attributes - Handles conversion of string URIs to UUID objects
- Logs validation warnings for classes not in the data profile
Note: This method must be called after parse_nodes() has created all objects.
Source: cimgraph/databases/fileparsers/xml_parser.py:238
parse_value(sub_element, cim_class, identifier)¶
Parses and sets attribute values or creates edges to other objects.
Parameters:
sub_element(xml.etree.ElementTree.Element): The XML sub-element containing the valuecim_class(type): The CIM class type of the parent objectidentifier(UUID): The UUID of the parent object
Returns:
object: The parsed value, edge object, orNone
Implementation Details:
Extracts attribute name from element tag (e.g.,
ACLineSegment.length)Checks for
rdf:datatypeattribute for CIM unitsChecks for
rdf:resourceattribute indicating an edge or enumerationThree parsing branches:
1. Edge to another object (
rdf:resourceto UUID):- Extracts UUID from resource URI
- Validates object exists in
class_index - Calls
create_edge()to link objects - Creates reverse/inverse edge automatically
- Logs warning if referenced object not found
2. Enumeration value (
rdf:resourceto enum):- Detects namespace in URI
- Parses enum class and value (e.g.,
PhaseCode.ABC) - Instantiates enum and sets on parent object
3. Primitive value (text content):
- Extracts text from element
- Passes
rdf:datatypeif present for CIM units - Calls
create_value()to handle type conversion
Source: cimgraph/databases/fileparsers/xml_parser.py:264
parse_node_query(graph, query_output)¶
Not implemented for XML file parser.
Note: This method is used by database backends to parse SPARQL query results.
get_edges_query(graph, cim_class)¶
Not implemented for XML file parser.
Note: This method is used by database backends to construct SPARQL queries.
get_all_edges(graph, cim_class)¶
Not implemented for XML file parser.
Note: This method is defined in the ConnectionInterface but not used by the XML parser, which builds the complete graph in create_new_graph() instead of querying incrementally.
get_all_attributes(graph, cim_class)¶
Not implemented for XML file parser.
Note: The XML parser loads all attributes during the initial create_new_graph() operation.
edge_query_parser(query_output, graph, cim_class, expand_graph=True)¶
Not implemented for XML file parser.
Note: This method is used by database backends to parse query results for edge expansion.
Common Inherited Methods¶
The XMLFile class inherits several utility methods from the ConnectionInterface base class. These methods are used internally but are documented here for completeness.
check_attribute(cim_class, attribute)¶
Validates and resolves attribute names including inverse associations.
Parameters:
cim_class(type): The CIM class typeattribute(str): The attribute name in formatClassName.attributeName
Returns:
str: The resolved attribute name, orNoneif not found
Implementation Details:
- Splits attribute into class name and link name
- First checks if attribute exists directly on
cim_class - If not, checks the source class for the attribute
- Resolves inverse associations using metadata
- Logs validation warnings for missing attributes
Source: cimgraph/databases/__init__.py:80
create_object(graph, class_type, uri)¶
Creates a new CIM object and adds it to the graph.
Parameters:
graph(Graph): The typed property graphclass_type(type): The CIM class type (e.g.,cim.ACLineSegment)uri(str): The RDF ID or mRID of the object
Returns:
object: The created or existing dataclass instance
Implementation Details:
- Converts URI string to UUID
- Checks if object already exists in graph
- If exists, returns existing object
- If not, creates new instance and adds to graph
- Handles non-UUID identifiers gracefully
Source: cimgraph/databases/__init__.py:245
create_edge(graph, cim_class, identifier, attribute, edge_class, edge_mRID)¶
Creates an association (edge) between two CIM objects.
Parameters:
graph(Graph): The typed property graphcim_class(type): The source object's class typeidentifier(UUID): The source object's UUIDattribute(str): The attribute name for the associationedge_class(type): The target object's class typeedge_mRID(str): The target object's mRID
Returns:
object: The edge object, orNoneif creation failed
Implementation Details:
- Validates attribute exists using
check_attribute() - Determines if attribute is single-valued or list-valued
- For lists: appends to existing list without duplicates
- For single values: sets the attribute directly
- Creates target object if it doesn't exist
- Updates the source object's attribute
Source: cimgraph/databases/__init__.py:216
create_value(graph, cim_class, identifier, attribute, value, datatype_uri=None)¶
Sets a primitive attribute value with proper type conversion.
Parameters:
graph(Graph): The typed property graphcim_class(type): The object's class typeidentifier(UUID): The object's UUIDattribute(str): The attribute namevalue(str): The string value to convert and setdatatype_uri(str): Optional RDF datatype URI for CIM units
Returns:
bool|int|float|str|object: The converted value
Implementation Details:
- Validates attribute using
check_attribute() - Handles CIM units when
datatype_uriis provided:- Extracts unit class name from URI
- Parses unit and multiplier (e.g., "MVA" -> "VA" + "M")
- Creates
CIMUnitinstance with proper conversions
- Type conversions for primitives:
bool: Converts "true"/"1" toTrue, "false"/"0" toFalseint: Converts to integer via float (handles "123.0")float: Direct float conversionlist: Appends without duplicates- Other types: Sets as string
- Logs warnings for type conversion failures
Source: cimgraph/databases/__init__.py:110
add_to_graph(obj, graph)¶
Adds an existing CIM object to the graph.
Parameters:
obj(object): A dataclass instance inheriting fromIdentitygraph(Graph): The typed property graph
Returns: None
Implementation Details:
- Creates class type entry in graph if not present
- Adds instance to graph using its identifier as key
- Does not overwrite existing instances
Source: cimgraph/databases/__init__.py:275
Performance Considerations¶
Load Times¶
Typical load times for standard IEEE test cases:
| Model | Nodes | Branches | Load Time |
|---|---|---|---|
| IEEE 13-bus | ~100 objects | ~10 lines | < 1 second |
| IEEE 123-bus | ~800 objects | ~100 lines | ~2-3 seconds |
| IEEE 8500-node | ~20,000 objects | ~8,000 lines | ~12 seconds |
Optimization Strategies¶
For large models (> 5,000 nodes), consider:
- Use a Database Backend: Blazegraph, GraphDB, or Neo4j provide much faster access
- Incremental Loading: Load only the equipment classes you need using selective queries
- Caching: Save the graph as a Python pickle for repeated use
Memory Usage¶
The XML parser loads the entire model into memory. Approximate memory requirements:
- Small models (< 1000 objects): ~10-20 MB
- Medium models (1000-5000 objects): ~50-100 MB
- Large models (> 10,000 objects): ~500 MB+
Threading Note¶
The code includes commented-out ThreadPoolExecutor implementation for parallel edge parsing (lines 183-186 in source). This may be enabled in future versions for improved performance.
Example: Complete Workflow¶
This example demonstrates a complete workflow: loading, modifying, and saving a CIM model.
import os
os.environ['CIMG_CIM_PROFILE'] = 'cimhub_2023'
import cimgraph.data_profile.cimhub_2023 as cim
from cimgraph.databases import XMLFile
from cimgraph.models import FeederModel
# 1. Load model from XML
file = XMLFile(filename='../../sample_models/ieee13.xml')
network = FeederModel(container=cim.Feeder(), connection=file)
# 2. Modify the model
for line in network.graph.get(cim.ACLineSegment, {}).values():
if line.length is not None:
# Increase all line lengths by 10%
line.length = line.length * 1.1
# 3. Save modified model
output_file = XMLFile(filename='../../sample_models/ieee13_modified.xml')
output_file.upload(network.graph)
print("Model modified and saved successfully")
File ../../sample_models/ieee13.xml not found. Defaulting to empty network graph No root element found in XML file File ../../sample_models/ieee13_modified.xml not found. Defaulting to empty network graph
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Cell In[8], line 19 17 # 3. Save modified model 18 output_file = XMLFile(filename='../../sample_models/ieee13_modified.xml') ---> 19 output_file.upload(network.graph) 21 print("Model modified and saved successfully") File ~/CIM-Graph/cimgraph/databases/fileparsers/xml_parser.py:345, in XMLFile.upload(self, graph) 343 rdf_header = 'rdf:ID="' 344 rdf_resource = '#' --> 345 f = open(self.filename, 'w', encoding='utf-8') 346 header = '<?xml version="1.0" encoding="utf-8"?>\n' 347 header += '<!-- un-comment this line to enable validation\n' FileNotFoundError: [Errno 2] No such file or directory: '../../sample_models/ieee13_modified.xml'
Troubleshooting¶
Common Issues¶
1. File Not Found
File model.xml not found. Defaulting to empty network graph
- Verify the file path is correct
- Use absolute paths or paths relative to working directory
2. Namespace Errors
Unable to parse <Element...>. This may be caused by an invalid namespace
- Check that XML file has proper namespace declarations
- Verify CIM profile matches the XML file's CIM version
- Use the
namespacesparameter to override if needed
3. Class Not in Profile
ClassName not in data profile
- Ensure the correct CIM profile is loaded
- Check if the XML uses a newer/older CIM version than the profile
- Adjust
CIMG_VALIDATION_LOG_LEVELto filter warnings
4. UUID Parsing Warnings
Unable to parse URI. Check the IEC61970-301 serialization
- Verify XML follows IEC 61970-301 standard
- Check
rdf:aboutandrdf:IDformat - May occur with non-standard mRID formats (handled gracefully)