初识python的SAX,摸索出了如下示例,与大家分享。
用到的示例xml文档:(请将以下内容保存为test.xml,注意main部分的路径)
<?xml version="1.0" encoding="utf-8"?>
<root>
<person age="18">
<name>张三</name>
<sex>男人</sex>
</person>
<person age="19" des="hello">
<name>李四</name>
<sex>女人</sex>
</person>
<person age="18" des="变态">
<name>王五</name>
<sex>不男不女</sex>
</person>
</root>
用来处理该xml文件的python代码:
#!/usr/bin/env python3
# coding: utf-8
import xml.sax
from pandas import DataFrame # 用于在最后将dict转换为pandas.DataFrame
class PersonHandler(xml.sax.ContentHandler):
"""定义ContentHandler对象
"""
def __init__(self, records):
self.CurrentTag = ""
self.id = 0
self.record = {}
self.records = records
# 元素开始事件
def startElement(self, tag, attributes):
self.CurrentTag = tag
if tag == "person":
self.id = self.id + 1
for key in attributes.keys():
self.record[key] = attributes[key]
# 内容事件
def characters(self, content):
if self.CurrentTag == "name":
self.record['name'] = content
elif self.CurrentTag == "sex":
self.record['sex'] = content
# 元素结束事件
def endElement(self, tag):
if tag == "person":
self.records[self.id] = self.record
# 该条记录结束
self.record = {}
self.CurrentTag = ""
if __name__ == "__main__":
parser = xml.sax.make_parser()
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
records = {}
parser.setContentHandler(PersonHandler(records))
parser.parse("test.xml")
df = DataFrame.from_dict(records, orient="index")
print(df)
输出:
age name sex des
1 18 张三 男人 NaN
2 19 李四 女人 hello
3 18 王五 不男不女 变态
Leave a Reply