Simple HTML Parsing in Swift

I needed to parse some HTML in a SwiftUI project so I could put the content into a Text view. I couldn’t use the HTML partly because of weird behaviour in WKWebView but that’s for another post. This is was a simple case, I didn’t need any structure information from the HTML, just the text with the tags removed. Parsing HTML is always a bit evil so I tend to look for built-in services that can do it for me. Here’s what I found.

Using NSAttributedString to load HTML

Turns out that NSAttributedString has a way to directly load HTML which means that I could avoid any external dependencies in my project, always a nice bonus.

import WebKit

...

htmlContent = "<div>The text I needed</div>"
guard let data: Data = htmlContent.data(using: .utf8)       
  // handle the error
  return
}

NSAttributedString.loadFromHTML(data: data, options: [.characterEncoding: String.Encoding.utf8.rawValue]) { attributed,attrs,error  
  guard nil == error else {
    // handle the error
    return
  }
  if let attributedHtml = attributed {
    // do something with the text of the result
    print(attributedHtml.string)
  }
}

I did run into one problem which is that I was using this on French content with accented characters and originally the accents were being improperly converted by this method. The issue is that by default NSAttributedString expects UTF 16 not UTF 8. Since the content was UTF 8 it was misinterpreting the accented characters. The solution is in the code above which is to specify the character encoding explicitly using the [.characterEncoding: String.Encoding.utf8.rawValue] parameter.

If you had more complex requirements for working with HTML something like SwiftSoup might be a better choice.

2 thoughts on “Simple HTML Parsing in Swift

  1. Hello Stephen,

    Thank you for your post – and hopefully this will assist me with a project I am working on. I would like to implement this in a SwiftUI view, but when I copy the code you posted I get a series of errors.

    Here is my code: (ContentView.swiftui)

    import SwiftUI
    import WebKit

    struct ContentView: View { [error build: Function declares an opaque return type, but has no return statements in its body from which to infer an underlying type]

    var body: some View {

    htmlContent = “The text I needed” [error build: Cannot find ‘htmlContent’ in scope]

    guard let data: Data = htmlContent.data(using: .utf8) [1. error build: Cannot find ‘htmlContent’ in scope
    2. error build: Expected ‘else’ after ‘guard’ condition]
    // handle the error
    return [error build: Non-void function should return a value]

    }

    [errors for the section below:
    1. error build: Expected ‘{‘ in body of function declaration
    2. error build: Expected ‘func’ keyword in instance method declaration
    3. error build: Expected ‘(‘ in argument list of function declaration
    4. error build: Expected ‘(‘ in argument list of function declaration
    5. error build: Expected declaration In declaration of ‘ContentView’

    NSAttributedString.loadFromHTML(data: data, options: [.characterEncoding: String.Encoding.utf8.rawValue]) { attributed,attrs,error
    guard nil == error else {
    // handle the error
    return
    }
    if let attributedHtml = attributed {
    // do something with the text of the result
    print(attributedHtml.string)
    }
    }
    }

    struct ContentView_Previews: PreviewProvider {
    static var previews: some View {
    ContentView()
    }
    }

    As you can see I literally copied your code straight into my ContentView file. Please will you give me guidance on what I am doing wrong…

    What I want to do is extract/parse selected data from an HTML document in my Documents folder on my Mac to either a textfield or textEditor. Is this possible using your code? Can you give me some guidance please?

  2. There are restrictions about what code you can put in a SwiftUI view. There are a few conditional or assignment statements you can use but not many. You should declare a function and use it. For example, I use something like this:

    func textFromHTML( html: String?, completion:@escaping (String) -> () ) {
    guard let htmlContent = html else {
    completion(“”)
    return
    }
    #if os(iOS)
    guard let data: Data = htmlContent.data(using: .utf8) else {
    completion(htmlContent)
    return
    }
    NSAttributedString.loadFromHTML(data: data, options: [.characterEncoding: String.Encoding.utf8.rawValue]) { attributed,attrs,error in
    guard nil == error else {
    completion(htmlContent)
    return
    }
    if let attributedHtml = attributed {
    completion( attributedHtml.string)
    }
    }
    #else
    completion(htmlContent)
    #endif
    }

Leave a Reply