Monday, May 20, 2024
 Popular · Latest · Hot · Upcoming
59
rated 0 times [  65] [ 6]  / answers: 1 / hits: 6169  / 5 Years ago, mon, may 27, 2019, 12:00:00

I want to parse some html with htmlparser2 module for Node.js. My task is to find a precise element by its ID and extract its text content.


I have read the documentation (quite limited) and I know how to setup my parser with the onopentag function but it only gives access to the tag name and its attributes (I cannot see the text). The ontext function extracts all text nodes from the given html string, but ignores all markup.


So here's my code.


const htmlparser = require("htmlparser2");
const file = '<h1 id="heading1">Some heading</h1><p>Foobar</p>';

const parser = new htmlparser.Parser({
onopentag: function(name, attribs){
if (attribs.id === "heading1"){
console.log(/*how to extract text so I can get "Some heading" here*/);
}
},

ontext: function(text){
console.log(text); // Some heading n Foobar
}
});

parser.parseComplete(file);

I expect the output of the function call to be 'Some heading'. I believe that there is some obvious solution but somehow it misses my mind.


Thank you.


More From » node.js

 Answers
0

You can do it like this using the library you asked about:



const htmlparser = require('htmlparser2');
const domUtils = require('domutils');

const file = '<h1 id=heading1>Some heading</h1><p>Foobar</p>';

var handler = new htmlparser.DomHandler(function(error, dom) {
if (error) {
console.log('Parsing had an error');
return;
} else {
const item = domUtils.findOne(element => {
const matches = element.attribs.id === 'heading1';
return matches;
}, dom);

if (item) {
console.log(item.children[0].data);
}
}
});

var parser = new htmlparser.Parser(handler);
parser.write(file);
parser.end();


The output you will get is Some Heading. However, you will, in my opinion, find it easier to just use a querying library that is meant for it. You of course, don't need to do this, but you can note how much simpler the following code is: How do I get an element name in cheerio with node.js



Cheerio OR a querySelector API such as https://www.npmjs.com/package/node-html-parser if you prefer the native query selectors is much more lean.



You can compare that code to something more lean, such as the node-html-parser which supports simply querying:



const { parse } = require('node-html-parser');

const file = '<h1 id=heading1>Some heading</h1><p>Foobar</p>';
const root = parse(file);
const text = root.querySelector('#heading1').text;
console.log(text);

[#7506] Thursday, May 23, 2019, 5 Years  [reply] [flag answer]
Only authorized users can answer the question. Please sign in first, or register a free account.
jazminkyrap

Total Points: 631
Total Questions: 89
Total Answers: 109

Location: Finland
Member since Fri, Oct 21, 2022
2 Years ago
;