Data Collection

The Semantics of Learning: RDF

I’ve always liked niche ideas. Sometimes this works out well, hey bleeding edge and too much knowledge about linux. But also the darker side, why can’t I find 27.5” mtb tires anywhere? For a moment I became very interested in Resource Description Framework (RDF) and how it might apply to database design. I’m a big believer in the grammar of __ projects. I started out doing analysis with dplyr and love the semantics of how it decomposes into subject and verbs. This is a relative perspective, since these concepts are of course based in English. Not all languages function the same; Spanish has adjectives coming after nouns. But, it’s a very appealing framework and feels very natural to me. Decomposing, “Employee A is in department B” and model it as subject (employees), predicate (is in), and object (department). These then become our dimension (employees), table keys (is in), and dimension (department).

The data I decided to use is the mysql employee database sample https://github.com/datacharmer/test_db. This seemed like a good option since there’s a few relations and something common you might encounter in the wild.

Everything was in place to start modeling. This is what I started with:

And eventually reached some kind of data nirvana:

At this point, I started to realize I was mostly making a good model somehow very worse. But, I did have the realization that a fully normalized database is already in the form of subject, predicate, object. In a normalized database you already have “Employee A is in Department B” and in this case you have the SCD2 to answer when Employee A belonged to department C.

This was in no way the result I wanted though! And in full disclosure has taken me over a year to finally write up. I had no clue what to do with this new found info. If anything maybe it will stop someone else from going through a useless modeling exercise. But, I was disappointed with the result. I had further plans involving dbt modeling and an exposition on using SQLite as a data warehouse too. This is the modeling equivalent of translating between cartesian and polar coordinates — useful and interesting, but hardly groundbreaking. In the end, I did learn something new and it’s always valuable to see where your curiosity takes you.

Resources: