Using TabularData to Dump Model Data
This post is brought to you by Emerge Tools, the best way to build on mobile.
The TabularData framework gets its bones by wrangling tables of data to train machine learning models. But don’t let the description on the tin force you to leave it alone, this little guy has a lot of power hiding underneath it.
For example, the framework can be used to:
- Parse .csv files
- Parse .json files
- Import, or export, either of those, and
- Dump amazing logs in your console from your own models.
Its utility expands beyond that, but I want to show you a little trick I use it for - specifically dealing with point #4 above. Say you’ve got some .json like this:
{
"people":[
{
"name": "David",
"mobile": 33333333,
"hasPets": true,
"pets": ["Dog", "Fish", "Bird"],
"address": {
"permanent": "France",
"current": "UK"
}
},
{
"name": "Jordan",
"mobile": 33333333,
"hasPets": false,
"pets": [],
"address": {
"permanent": "Austrailia",
"current": "US"
}
}
]
}
But imagine it’s bigger, like….a lot bigger. Dare I say massive? Say 1,000 or so entries. If you were to decode such a response using pedestrian models like so:
import Foundation
struct Tennants: Codable {
let renters: [Renter]
enum CodingKeys: String, CodingKey {
case renters = "people"
}
}
struct Renter: Codable {
let name: String
let mobile: Int
let hasPets: Bool
let pets: [String]
let address: Address
}
struct Address: Codable {
let permanent, current: String
}
// Later on...
do {
let decoder = JSONDecoder()
let people: Tennants = try decoder.decode(Tennants.Type,
from: data)
print(people)
} catch {
Swift.debugPrint("Unable to decode response: \(error.localizedDescription)")
}
and simply print
(or dump
, Swift.debugPrint
, Mirror(reflecting:)
) in lldb, you’d see something like this:
Tennants(renters: [SwiftUI_Playgrounds.Renter(name: "David", mobile: 33333333, hasPets: true, pets: ["Dog", "Fish", "Bird"], address: SwiftUI_Playgrounds.Address(permanent: "France", current: "UK")), SwiftUI_Playgrounds.Renter(name: "Jordan", mobile: 33333333, hasPets: false, pets: [], address: SwiftUI_Playgrounds.Address(permanent: "Austrailia", current: "US")), SwiftUI_Playgrounds.Renter(name: "Peter", mobile: 33333333, hasPets: true, pets: ["Lizard"], address: SwiftUI_Playgrounds.Address(permanent: "India", current: "FR")), SwiftUI_Playgrounds.Renter(name: "Sarah", mobile: 33333333, hasPets: false, pets: [], address: SwiftUI_Playgrounds.Address(permanent: "Egypt", current: "US")), SwiftUI_Playgrounds.Renter(name: "Rory", mobile: 33333333, hasPets: true, pets: ["Snakes"], address: SwiftUI_Playgrounds.Address(permanent: "United Kingdom", current: "US"))])
It’s not bad in a pinch, certainly serviceable. But, if you put that data into TabularData’s DataFrame
and print that? Well, let’s just venture to say it’s a smidge nicer. Check it out:
…is….is this love 😍?
I’ve started using this technique when any of these are true:
- I’m dealing with large amounts of data
- Regardless of size, if I want to sort or filter data
- I need a better, easy way to visualize the data I’m getting from my own model layer for scannability
- Or generally, I simply want to read things easier
…and I adore it. The code to spin it up is trivial:
do {
let people = try await loadPeople()
let data = try JSONEncoder().encode(people.renters)
// Create the DataFrame from .json data
let dataFrame = try DataFrame(jsonData: data)
// Beautiful print
print(dataFrame.description(options: .init(maximumLineWidth: 250)))
} catch {
Swift.debugPrint("Unable to create DataFrame: \(error.localizedDescription)")
}
Notice the options:
parameter. There, you can control cell width, date formatting options, line widths and more. If you want a mountain of data to be translated into a digestible format quickly, creating throw away DataFrame
instances is a good option.
If you want, you can even slice the data up a bit to get at certain properties. For example, what if I only wanted to know which renters had pets, and what they were?
That’s no issue, as you can get another DataFrame
from an existing one - but specify that it should only include a few of the columns you’re interested in:
do {
let people = try await loadPeople()
let data = try JSONEncoder().encode(people.renters)
// Create the DataFrame from .json data
let dataFrame = try DataFrame(jsonData: data)
// Just get names and pets
let partialFrame = dataFrame.selecting(columnNames: "name", "pets")
print(partialFrame)
} catch {
Swift.debugPrint("Unable to create DataFrame: \(error.localizedDescription)")
}
// Results in...
┏━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ ┃ name ┃ pets ┃
┃ ┃ <String> ┃ <Array<Optional<Any>>> ┃
┡━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0 │ David │ [Optional(Dog), Optional(Fish), Optional(Bird)] │
│ 1 │ Jordan │ [] │
│ 2 │ Peter │ [Optional(Lizard)] │
│ 3 │ Sarah │ [] │
│ 4 │ Rory │ [Optional(Snakes)] │
└───┴──────────┴─────────────────────────────────────────────────┘
5 rows, 2 columns
There are so many flavors of Swift’s functional stuff in there, too - built just for the framework. You get the idea, but real quick - now suppose you wanted only the names, and sorted:
do {
let people = try await loadPeople()
let data = try JSONEncoder().encode(people.renters)
// Create the DataFrame from .json data
let dataFrame = try DataFrame(jsonData: data)
// Select only names, and sort them
let sortedNames = dataFrame.sorted(on: .init("name", String.self), by: { lhs, rhs in
lhs < rhs
}).selecting(columnNames: "name")
print(sortedNames.description)
} catch {
Swift.debugPrint("Unable to create DataFrame: \(error.localizedDescription)")
}
// Results in...
┏━━━┳━━━━━━━━━━┓
┃ ┃ name ┃
┃ ┃ <String> ┃
┡━━━╇━━━━━━━━━━┩
│ 0 │ David │
│ 1 │ Jordan │
│ 2 │ Peter │
│ 3 │ Rory │
│ 4 │ Sarah │
└───┴──────────┘
5 rows, 1 column
Again, a lot you can do here. You might provide default values, completely transform or combine columns to decode your own models or even print out statistical data using numericSummary
.
I’m belaboring the point now, but there’s virtually nothing you can’t do with data augmentation or formatting. This specific post shows how I personally use DataFrame
with the console, but you can use it for far more practical uses. In fact, that’s what it’s meant for.
Suppose you had two columns representing longitude and latitude coming back from some data source. You could smash them together in a combineColumns(into:transform)
to create CLLocation
instances to use while someone is searching for a location in a search bar. Or, you could do SQL
-esque joins of two different DataFrame
instances using joined(on:)
.
Snazzy.
let concatenatedThoughts = """
There are several ways to spin up a DataFrame
, too. You can use local .csv or .json files, or their raw data and more. Be sure to check out the docs.
"""
The framework deserves its own proper post, quite honestly. If you’ve got any sort of of tabular data (i.e. data structured, or unstructured, in rows and columns) you can sort, change, filter or sift through it using its powerful (cliché, I know) APIs.
If you want to mess around with it right now to get your hands dirty, just create a DataFrame
using a dictionary literal - and then you can toy around with dumping to the console as we’ve done here or feel out its API:
let testFrame: DataFrame = [
"id": [0, 1, 2, 3, 4],
"job": ["developer", "designer", "pm", "em", "staff SWE"],
"fillBy": ["jordan", "jansyn", "bennett", "remy", "baylor"]
]
print(testFrame.description)
// Results in...
┏━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ ┃ id ┃ job ┃ fillBy ┃
┃ ┃ <Int> ┃ <String> ┃ <String> ┃
┡━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ 0 │ 0 │ developer │ jordan │
│ 1 │ 1 │ designer │ jansyn │
│ 2 │ 2 │ pm │ bennett │
│ 3 │ 3 │ em │ remy │
│ 4 │ 4 │ staff SWE │ baylor │
└───┴───────┴───────────┴──────────┘
5 rows, 3 columns
This whole post is kinda funny, though, because it goes to show you that if you hold something a little differently than what its instructions say - sometimes you get useful results.
Perhaps Cupertino & Friends™️ didn’t intend for developers to use the TabularData framework for…debug logging? It’s a bit of a lingua franca situation, as a framework built for maximum efficiency in training machine learning models crosses over and talks to the folks who simply want to log out “It works” within LLDB.
But here we are, and it’s incredibly useful for the job.
Until next time ✌️