Simple HTML Parsing in Swift

I needed to parse some HTML in a SwiftUI project so I could put the content into a Text view. I couldn’t use the HTML partly because of weird behaviour in WKWebView but that’s for another post. This is was a simple case, I didn’t need any structure information from the HTML, just the text with the tags removed. Parsing HTML is always a bit evil so I tend to look for built-in services that can do it for me. Here’s what I found.

Using NSAttributedString to load HTML

Turns out that NSAttributedString has a way to directly load HTML which means that I could avoid any external dependencies in my project, always a nice bonus.

import WebKit

...

htmlContent = "<div>The text I needed</div>"
guard let data: Data = htmlContent.data(using: .utf8)       
  // handle the error
  return
}

NSAttributedString.loadFromHTML(data: data, options: [.characterEncoding: String.Encoding.utf8.rawValue]) { attributed,attrs,error  
  guard nil == error else {
    // handle the error
    return
  }
  if let attributedHtml = attributed {
    // do something with the text of the result
    print(attributedHtml.string)
  }
}

I did run into one problem which is that I was using this on French content with accented characters and originally the accents were being improperly converted by this method. The issue is that by default NSAttributedString expects UTF 16 not UTF 8. Since the content was UTF 8 it was misinterpreting the accented characters. The solution is in the code above which is to specify the character encoding explicitly using the [.characterEncoding: String.Encoding.utf8.rawValue] parameter.

If you had more complex requirements for working with HTML something like SwiftSoup might be a better choice.

Leave a Reply

Your email address will not be published.