I receive an html like that below from a server. I rebuild the textual part by using the XPath exp @"//text()" and appending the "nodeContent" value to a string. The code is something like this:
for (int i=2; i<[resultXPathQuery count]; i++) { [mytext appendString:[[resultXPathQuery objectAtIndex:i] objectForKey:@"nodeContent"]]; [mytext appendString:@"\n"];}
I obtain:
Line 1line 2line 3 line 4
How could I build the textual part also considering the empty node?
I would to obtain:
Line 1line 2line 3line 4
<html><head><title>A title</title><style type="text/css">ol{margin:0;padding:0}p{margin:0}.c0{font-size:12pt;background-color:#ffffff;font-family:Times New Roman}.c6{width:432.0pt;background-color:#ffffff;padding:72.0pt 90.0pt 72.0pt 90.0pt}.c7{color:#aaaaaa;font-family:Times New Roman}.c3{color:#0000ee;text-decoration:underline}.c5{color:inherit;text-decoration:inherit}.c2{font-size:12pt;font-family:Times New Roman}.c4{height:12pt}.c1{direction:ltr}body{color:#000000;font-size:12pt;font-family:Times New Roman}h1{padding-top:12.0pt;line-height:1.0;text-align:left;color:#000000;font-size:24pt;font- family:Times New Roman;font-weight:bold;padding-bottom:12.0pt}h2{padding-top:11.25pt;line-height:1.0;text-align:left;color:#000000;font-size:18pt;font-family:Times New Roman;font-weight:bold;padding-bottom:11.25pt}h3{padding-top:12.0pt;line-height:1.0;text-align:left;color:#000000;font-size:14pt;font-family:Times New Roman;font-weight:bold;padding-bottom:12.0pt}h4{padding-top:12.75pt;line-height:1.0;text-align:left;color:#000000;font-size:12pt;font-family:Times New Roman;font-weight:bold;padding-bottom:12.75pt}h5{padding-top:12.75pt;line-height:1.0;text-align:left;color:#000000;font-size:9pt;font-family:Times New Roman;font-weight:bold;padding-bottom:12.75pt}h6{padding-top:18.0pt;line-height:1.0;text-align:left;color:#000000;font-size:8pt;font-family:Times New Roman;font-weight:bold;padding-bottom:18.0pt}</style></head><body class="c6"><p class="c1"><span class="c2">A title</span></p><p class="c1 c4"><span class="c2"></span></p><p class="c4 c1"><span class="c2"></span></p><p class="c1"><span class="c7">Line 1</span></p><p class="c1"><span class="c7">line 2</span></p><p class="c4 c1"><span class="c7"></span></p><p class="c1"><span class="c7">line 3</span></p><p class="c4 c1"><span class="c7"></span></p><p class="c4 c1"><span class="c7"></span></p><p class="c3 c2"><span class="c1"></span></p><p class="c1"><span class="c7">line 4</span></p></body></html>
EDIT
Really, I noticed that the html can be more "complicated", so it's not enough selecting all the span elements or p elements. Moreover, more span elements can appear in the same p element, so in that case I have not to create a new line in my string.
This is the body of a more complicated returned html:
<body class="c13"><p class="c5"><span>gfgfgfd</span></p><p class="c1"><span></span></p><p class="c5 c10"><span>ghhgfhgfh hghg hgkfhjgk ghjgkh ghjgjhg gjhjg gjhj gjhgjhgjhg gfhjkgjg jghjgfhjgf fghfj jghfj fghjggf jhgjgjgkjg</span></p><p class="c1 c10"><span></span></p><p class="c4"><span>gfgfgfd</span></p><p class="c4"><span>f</span></p><p class="c4"><span>gfdgfdg</span><span class="c7">hg</span></p><p class="c4"><span class="c7">ghgfhgfh</span></p><p class="c4"><span class="c7">gfhgfhgf</span></p><p class="c5"><span class="c7">hgfh </span><span class="c0">gfdgfg</span></p><p class="c5"><span class="c0">fgfdgfdgfd</span></p><p class="c5"><span class="c0">gdfgdfgfd</span></p><p class="c5"><span class="c0">gfgf</span></p><p class="c1"><span class="c0"></span></p><p class="c5"><span class="c0 c8"><a class="c12" href="http://www.google.com">www.google.com</a></span></p><p class="c1"><span class="c0"></span></p><p class="c5"><span class="c0">fgfdgfdg</span></p><p class="c5"><span class="c0">fgffgfdgfg</span><span class="c0 c11">gfgfdgfd fgd fd</span><span class="c0">fdgfdg</span></p><p class="c5"><span class="c0">fgfdgfdgf</span></p><p class="c5"><span class="c0">gfd</span></p><p class="c5"><span class="c0">gfgf</span></p><p class="c1"><span class="c0"></span></p><p class="c5"><span class="c0 c8"><a class="c12" href="mailto:….">...</a></span></p><p class="c1"><span class="c0"></span></p><ol class="c9" start="1"><li class="c3"><span class="c0">gfgfd</span></li><li class="c3"><span class="c0">gfdgfd</span></li><li class="c3"><span class="c0">gfdgfd</span></li><li class="c3"><span class="c0">gdfgfd</span></li></ol><p class="c1"><span class="c0"></span></p><p class="c5"><span class="c0">hgfhgf</span></p><p class="c5"><span class="c0">gfhgfh</span></p><p class="c5"><span class="c0">hgfhgf</span></p><p class="c1"><span class="c0"></span></p><ol class="c2" start="1"><li class="c3"><span class="c0">gfhg</span></li><li class="c3"><span class="c0">hgfh</span></li><li class="c3"><span class="c0">hgf</span></li></ol><p class="c1"><span class="c0"></span></p><h1 class="c5 c15"><a name="h.kafwflosthlg"></a><span class="c7 c14">hgfhgfh</span></h1><p class="c1"><span class="c6"></span></p><p class="c1"><span class="c6"></span></p><p class="c1"><span class="c6"></span></p></body>
I'd need an XPath expression that selects p, h1, h2,..., h6, li elements, and considers the inner textual part in such way that new line and empty lines are properly detected.