Monday, May 20, 2024
 Popular · Latest · Hot · Upcoming
152
rated 0 times [  154] [ 2]  / answers: 1 / hits: 132903  / 12 Years ago, fri, october 19, 2012, 12:00:00

I'm attempting map HTML into JSON with structure intact. Are there any libraries out there that do this or will I need to write my own? I suppose if there are no html2json libraries out there I could take an xml2json library as a start. After all, html is only a variant of xml anyway right?



UPDATE: Okay, I should probably give an example. What I'm trying to do is the following. Parse a string of html:



<div>
<span>text</span>Text2
</div>


into a json object like so:



{
type : div,
content : [
{
type : span,
content : [
Text2
]
},
Text2
]
}


NOTE: In case you didn't notice the tag, I'm looking for a solution in Javascript


More From » html

 Answers
14

I just wrote this function that does what you want; try it out let me know if it doesn't work correctly for you:


// Test with an element.
var initElement = document.getElementsByTagName("html")[0];
var json = mapDOM(initElement, true);
console.log(json);

// Test with a string.
initElement = "<div><span>text</span>Text2</div>";
json = mapDOM(initElement, true);
console.log(json);

function mapDOM(element, json) {
var treeObject = {};

// If string convert to document Node
if (typeof element === "string") {
if (window.DOMParser) {
parser = new DOMParser();
docNode = parser.parseFromString(element,"text/xml");
} else { // Microsoft strikes again
docNode = new ActiveXObject("Microsoft.XMLDOM");
docNode.async = false;
docNode.loadXML(element);
}
element = docNode.firstChild;
}

//Recursively loop through DOM elements and assign properties to object
function treeHTML(element, object) {
object["type"] = element.nodeName;
var nodeList = element.childNodes;
if (nodeList != null) {
if (nodeList.length) {
object["content"] = [];
for (var i = 0; i < nodeList.length; i++) {
if (nodeList[i].nodeType == 3) {
object["content"].push(nodeList[i].nodeValue);
} else {
object["content"].push({});
treeHTML(nodeList[i], object["content"][object["content"].length -1]);
}
}
}
}
if (element.attributes != null) {
if (element.attributes.length) {
object["attributes"] = {};
for (var i = 0; i < element.attributes.length; i++) {
object["attributes"][element.attributes[i].nodeName] = element.attributes[i].nodeValue;
}
}
}
}
treeHTML(element, treeObject);

return (json) ? JSON.stringify(treeObject) : treeObject;
}

Working example: http://jsfiddle.net/JUSsf/ (Tested in Chrome, I can't guarantee full browser support - you will have to test this).


​It creates an object that contains the tree structure of the HTML page in the format you requested and then uses JSON.stringify() which is included in most modern browsers (IE8+, Firefox 3+ .etc); If you need to support older browsers you can include json2.js.


It can take either a DOM element or a string containing valid XHTML as an argument (I believe, I'm not sure whether the DOMParser() will choke in certain situations as it is set to "text/xml" or whether it just doesn't provide error handling. Unfortunately "text/html" has poor browser support).


You can easily change the range of this function by passing a different value as element. Whatever value you pass will be the root of your JSON map.


[#82470] Thursday, October 18, 2012, 12 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
kaleyv

Total Points: 259
Total Questions: 99
Total Answers: 107

Location: Saint Helena
Member since Tue, Nov 3, 2020
4 Years ago
;