Using TabularData to Dump Model Data

// Written by Jordan Morgan // May 4th, 2023 // Read it in about 6 minutes // RE: TabularData

This post is brought to you by Clerk. Add secure, native iOS authentication in minutes with Clerk’s pre-built SwiftUI components.

The TabularData framework gets its bones by wrangling tables of data to train machine learning models. But don’t let the description on the tin force you to leave it alone, this little guy has a lot of power hiding underneath it.

For example, the framework can be used to:

Parse .csv files
Parse .json files
Import, or export, either of those, and
Dump amazing logs in your console from your own models.

Its utility expands beyond that, but I want to show you a little trick I use it for - specifically dealing with point #4 above. Say you’ve got some .json like this:

{
    "people":[
        {
            "name": "David",
            "mobile": 33333333,
            "hasPets": true,
            "pets": ["Dog", "Fish", "Bird"],
            "address": {
                "permanent": "France",
                "current": "UK"
            }
        },
        {
            "name": "Jordan",
            "mobile": 33333333,
            "hasPets": false,
            "pets": [],
            "address": {
                "permanent": "Austrailia",
                "current": "US"
            }
        }
    ]
}

But imagine it’s bigger, like….a lot bigger. Dare I say massive? Say 1,000 or so entries. If you were to decode such a response using pedestrian models like so:

import Foundation

struct Tennants: Codable {
    let renters: [Renter]
    
    enum CodingKeys: String, CodingKey {
        case renters = "people"
    }
}

struct Renter: Codable {
    let name: String
    let mobile: Int
    let hasPets: Bool
    let pets: [String]
    let address: Address
}

struct Address: Codable {
    let permanent, current: String
}

// Later on...
do {
    let decoder = JSONDecoder()
    let people: Tennants = try decoder.decode(Tennants.Type, 
                                              from: data)
    print(people)
} catch {
    Swift.debugPrint("Unable to decode response: \(error.localizedDescription)")
}

and simply print (or dump, Swift.debugPrint, Mirror(reflecting:)) in lldb, you’d see something like this:

Tennants(renters: [SwiftUI_Playgrounds.Renter(name: "David", mobile: 33333333, hasPets: true, pets: ["Dog", "Fish", "Bird"], address: SwiftUI_Playgrounds.Address(permanent: "France", current: "UK")), SwiftUI_Playgrounds.Renter(name: "Jordan", mobile: 33333333, hasPets: false, pets: [], address: SwiftUI_Playgrounds.Address(permanent: "Austrailia", current: "US")), SwiftUI_Playgrounds.Renter(name: "Peter", mobile: 33333333, hasPets: true, pets: ["Lizard"], address: SwiftUI_Playgrounds.Address(permanent: "India", current: "FR")), SwiftUI_Playgrounds.Renter(name: "Sarah", mobile: 33333333, hasPets: false, pets: [], address: SwiftUI_Playgrounds.Address(permanent: "Egypt", current: "US")), SwiftUI_Playgrounds.Renter(name: "Rory", mobile: 33333333, hasPets: true, pets: ["Snakes"], address: SwiftUI_Playgrounds.Address(permanent: "United Kingdom", current: "US"))])

It’s not bad in a pinch, certainly serviceable. But, if you put that data into TabularData’s DataFrame and print that? Well, let’s just venture to say it’s a smidge nicer. Check it out:

A Hacker News comment mentioning that Supabase offers an iOS SDK.

…is….is this love 😍?

I’ve started using this technique when any of these are true:

I’m dealing with large amounts of data
Regardless of size, if I want to sort or filter data
I need a better, easy way to visualize the data I’m getting from my own model layer for scannability
Or generally, I simply want to read things easier

…and I adore it. The code to spin it up is trivial:

do {
    let people = try await loadPeople()
    let data = try JSONEncoder().encode(people.renters)

    // Create the DataFrame from .json data
    let dataFrame = try DataFrame(jsonData: data)

    // Beautiful print
    print(dataFrame.description(options: .init(maximumLineWidth: 250)))
} catch {
    Swift.debugPrint("Unable to create DataFrame: \(error.localizedDescription)")
}

Notice the options: parameter. There, you can control cell width, date formatting options, line widths and more. If you want a mountain of data to be translated into a digestible format quickly, creating throw away DataFrame instances is a good option.

If you want, you can even slice the data up a bit to get at certain properties. For example, what if I only wanted to know which renters had pets, and what they were?

That’s no issue, as you can get another DataFrame from an existing one - but specify that it should only include a few of the columns you’re interested in:

do {
    let people = try await loadPeople()
    let data = try JSONEncoder().encode(people.renters)

    // Create the DataFrame from .json data
    let dataFrame = try DataFrame(jsonData: data)

    // Just get names and pets
    let partialFrame = dataFrame.selecting(columnNames: "name", "pets")
    print(partialFrame)
} catch {
    Swift.debugPrint("Unable to create DataFrame: \(error.localizedDescription)")
}

// Results in...
┏━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃   ┃ name     ┃ pets                                            ┃
┃   ┃ <String> ┃ <Array<Optional<Any>>>                          ┃
┡━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0 │ David    │ [Optional(Dog), Optional(Fish), Optional(Bird)] │
│ 1 │ Jordan   │ []                                              │
│ 2 │ Peter    │ [Optional(Lizard)]                              │
│ 3 │ Sarah    │ []                                              │
│ 4 │ Rory     │ [Optional(Snakes)]                              │
└───┴──────────┴─────────────────────────────────────────────────┘
5 rows, 2 columns

There are so many flavors of Swift’s functional stuff in there, too - built just for the framework. You get the idea, but real quick - now suppose you wanted only the names, and sorted:

do {
    let people = try await loadPeople()
    let data = try JSONEncoder().encode(people.renters)

    // Create the DataFrame from .json data
    let dataFrame = try DataFrame(jsonData: data)

    // Select only names, and sort them
    let sortedNames = dataFrame.sorted(on: .init("name", String.self), by: { lhs, rhs in
        lhs < rhs
    }).selecting(columnNames: "name")
    print(sortedNames.description)
} catch {
    Swift.debugPrint("Unable to create DataFrame: \(error.localizedDescription)")
}

// Results in...
┏━━━┳━━━━━━━━━━┓
┃   ┃ name     ┃
┃   ┃ <String> ┃
┡━━━╇━━━━━━━━━━┩
│ 0 │ David    │
│ 1 │ Jordan   │
│ 2 │ Peter    │
│ 3 │ Rory     │
│ 4 │ Sarah    │
└───┴──────────┘
5 rows, 1 column

Again, a lot you can do here. You might provide default values, completely transform or combine columns to decode your own models or even print out statistical data using numericSummary.

I’m belaboring the point now, but there’s virtually nothing you can’t do with data augmentation or formatting. This specific post shows how I personally use DataFrame with the console, but you can use it for far more practical uses. In fact, that’s what it’s meant for.

Suppose you had two columns representing longitude and latitude coming back from some data source. You could smash them together in a combineColumns(into:transform) to create CLLocation instances to use while someone is searching for a location in a search bar. Or, you could do SQL-esque joins of two different DataFrame instances using joined(on:).

Snazzy.

let concatenatedThoughts = """

There are several ways to spin up a DataFrame, too. You can use local .csv or .json files, or their raw data and more. Be sure to check out the docs.

"""

The framework deserves its own proper post, quite honestly. If you’ve got any sort of of tabular data (i.e. data structured, or unstructured, in rows and columns) you can sort, change, filter or sift through it using its powerful (cliché, I know) APIs.

If you want to mess around with it right now to get your hands dirty, just create a DataFrame using a dictionary literal - and then you can toy around with dumping to the console as we’ve done here or feel out its API:

let testFrame: DataFrame = [
    "id": [0, 1, 2, 3, 4],
    "job": ["developer", "designer", "pm", "em", "staff SWE"],
    "fillBy": ["jordan", "jansyn", "bennett", "remy", "baylor"]
]
        
print(testFrame.description)

// Results in...
┏━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃   ┃ id    ┃ job       ┃ fillBy   ┃
┃   ┃ <Int> ┃ <String>  ┃ <String> ┃
┡━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ 0 │     0 │ developer │ jordan   │
│ 1 │     1 │ designer  │ jansyn   │
│ 2 │     2 │ pm        │ bennett  │
│ 3 │     3 │ em        │ remy     │
│ 4 │     4 │ staff SWE │ baylor   │
└───┴───────┴───────────┴──────────┘
5 rows, 3 columns

This whole post is kinda funny, though, because it goes to show you that if you hold something a little differently than what its instructions say - sometimes you get useful results.

Perhaps Cupertino & Friends™️ didn’t intend for developers to use the TabularData framework for…debug logging? It’s a bit of a lingua franca situation, as a framework built for maximum efficiency in training machine learning models crosses over and talks to the folks who simply want to log out “It works” within LLDB.

But here we are, and it’s incredibly useful for the job.

Until next time ✌️