Introducing Gaum: An Open Source O/RM That isn’t an O/RM

At ShiftLeft we’re firm believers in the value of open source software. We leverage too many libraries to count, which massively scales our feature velocity and reliability. We also believe in contributing back when and where can so it is my pleasure to introduce you to our latest contribution today:

  • Gaum, a flexible open source library to talk to Postgres

While in someways like an Object-relational mapping (O/RM), it differs in several key ways.

Why did we build Gaum?

Its pretty simply, we needed it to deliver our application security service.

Why did we need Gaum?

The road to hell is paved with good intentions
 — 
Someone a long time ago… that was probably right

We started building our service in early 2017 and our stack looks very different today from what we started with. We had the basic layout:

  • Some aspects of the kind of data we wanted to store
  • Still deciding how to store data
  • Frequency, density, uniqueness?

The value of cross references were mutating along with the inherently evolving roadmap of an early-stage startup. However, our overarching goal remained:

To provide a rock solid service that yields the most useful/actionable data to our users in the most concise possible way

Our initial implementation of much of our API was a bare bones implementation in Go. We manually created simple persistence, which was more than enough at that time. We were focused on building aspects of our service that didn’t require anything more. But then, suddenly, our simple persistence wasn’t enough and we started mutating our structure as the data showed us the path. We had built an elegant and simple solution, but what we needed was a a very flexible and generic one, at least for a period.

You should spend more on differentiating software that makes your business more desirable, and less on software that doesn’t…
 — 
Bruce Perens

We decided that, until we better understood the best way to shape our data, we shouldn’t worry about optimizing the efficiency of storing it. The tricky thing with data efficiency, is that you first need to figure out the best way to extract the information your service requires, in order to determine the best architecture. A parallel could be drawn to a work table, you first need to use it, work on it, live it to analyze the mess and from it obtain a use pattern for your tools and then arrange them.

And there, we made a compromise, an O/RM. O/RMs, like many other technologies that bridge two different paradigms, have their fair share of detractors and supporters… and we’ve certain experienced both sides. At first it was wonderful, in about a week we moved our code base to use the O/RM (I am intentionally omitting the name because I don’t believe in software shaming open source projects) and for a period it was good: We moved the structure of our data, added columns, made queries, moved info and it was all done relatively easily, almost “magically”.

Magic always comes with a price
 — 
Robert Carlyle, representing Rumpelstiltskin.

The cracks on the convenience of an O/RM were quite evident fairly fast, as our data structure stabilized. When you have a solid data shape, you start thinking in powerful queries, products and operations, O/RMs make simple and very common operations easy to introduce but increase greatly the difficulty of non-common ones. Turns out that our honeymoon period O/RM was getting to an end, and our frustrations with it, just beginning.

I hate this O/RM
 — 
everybody at my company

Our code base is pretty clean and therefore adding or removing dependencies is quite easy, yet the sunken cost fallacy was a bit hard to defeat. We stretched the O/RM like the last chewing gum, a lot of creativity was used until we reached a point that we realized we were bypassing most of the O/RM and going to straight raw SQL, which is quite prone to error, it was time for us to reconsider our approach. .

You’re not wrong Walter. You’re just an a******.
 — 
The Dude in`The big lebowsky`

There are many things about our O/RM we appreciated, but many others, while close to what we needed, ultimately weren’t close enough. Many of the approaches felt fresh and fit our model very well. Some others were not.

A big deterrent was non-determinism. There is not a clear path between Go and SQL in many aspects, so behavior in SQL is often hard to predict from the Go counterpart and clear patterns are hard to discern on how different structures are going to translate from one to another. It is impossible to cover all the gaps between two paradigms, that is completely understandable, but the issue is that all the gaps that are not covered are dreadful “edge cases.” Many times I went from method call to doc, to code, to be forced to write a test case to learn what the O/RM would do in a determined situation. Even then my conclusions were only valid for specific version because the behavior was not granted by design. Other limiting aspect of O/RMs is that the way to resolve the numerous edge cases is by defining many corresponding conventions on how to use it. The works most of the time for a basic usage, doesn’t scale as your codebase grows.

The motivation

There were some things though that we really liked about the idea of an O/RM:

  • Expressing most of the SQL in Go, which allows us to catch some of the errors before they reach the DB. However, we needed more checks and more granular expressions to write better queries without incurring in text templates or large chunks of SQL embedded into our codebase which could easily lead to hard to trace or undetected errors.
  • Composition of queries in a programmatic way without the need to operate on strings, with clear defined rules. However, we did miss the ability to print the resulting SQL for analysis or just to use it independently of the O/RM.
  • Tools to retrieve data structs instead of having to pass individual fields as receivers and overall abstraction of the low-level movements of the db. However, we did need more deterministic ways to retrieve this data with less reliance on Go’s struct level tags, which are not type checked, in the name definition and determination of receiver fields.
  • Tools to abstract common things like Transactions, multiple inserts and similar but we wanted to be able to control a bit more these. We know our DB, we just don’t want to write a lot of boilerplate each time. We also wanted precision on cases like multi insert instead of relying on out of band info encoded in the format of the arguments. A good example of this is the handling of data retrieval. It will behave differently if a type or a slice of the same type is passed to the same function or in a different way altogether if a completely different type than the one expected is passed. We’ve used this in our advantage by making sure our O/RM filled data from types with only one field name match. This could as easily have backfired on us because accidental name clash could have given us an object filled with one right field and two wrong ones..

We wanted to:

  • Remove almost all reflection use, except for basic fetching into receivers.
  • Design the API so methods that represent SQL should receive arguments in a way that makes it clear how that will translate into a query from Go.
  • Express very complex queries without having to tweak the library in weird ways like the combination of undocumented features or use of settings that accidentally produced the output we need in SQL, our high level abstractions should be written on top of lower level ones so we can decide to go that path without breaking the designed functionality of the library.
  • Produce a clear definition of what causes the SQL to be generated and what command will take precedence over other when generating the query. We don’t want reflection to suddenly shadow an explicit declaration.
  • Establish clear and consistent rules on when you get an object or a copy of the object.
  • Create comprehensive documentation with inline usage examples for Godoc.
  • Throw errors before unexpected functionality. We wanted to be able to fail when we did not know what was going to happen. We also wanted to strike a balance between complexity and abstraction. We don’t need to express every single thing in Go and we did not want to re-write SQL, we aimed at making the common interactions simple in order to concentrate on getting the complex interactions right.

Essentially, we wanted the library to help us talk to the DB, without being a bottleneck.

The inspiration

Our solution is gaum, a flexible open source library to talk to Postgres. Gaum has several key features, including:

  • A query composer which can produce plain, valid SQL and sanitized arguments to pass directly to a generic connection, if you so desire.
  • A collection of helpers that generate SQL fragments that can be used as parameters for the query composer so you can express as much of your SQL in Go as you want, without being required to.
  • A set of query convenient functions to talk to the db that are also integrated with the composer so you can query the DB directly from the composed SQL object, but you don’t have to.
  • Gaum enables you to pass bare SQL, field to attribute maps for receivers and to use iterators and closures so you can separate your query calling and your data fetching.

We’re optimistic that the flexibility that we’ve built into Gaum will be beneficial for many use cases. By offering a range of options, that let you be as high or low level as you need it to be, we’ve already leveraged Gaum in many ways, but we aren’t done yet. We’ll keep growing Gaum as we gain experience using it to handle data our data and we hope it will be helpful for yours as well!


Introducing Gaum: An Open Source O/RM That isn’t an O/RM was originally published in ShiftLeft Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.



*** This is a Security Bloggers Network syndicated blog from ShiftLeft Blog - Medium authored by Horacio Duran. Read the original post at: https://blog.shiftleft.io/introducing-gaum-an-open-source-o-rm-that-isnt-an-o-rm-e7fd2880396e?source=rss----86a4f941c7da---4

Horacio Duran

Horacio Duran

Horacio Duran is currently an Infrastructure Software Engineer at ShiftLeft. For the past several years, he has been focused on software development in the Cloud and driving efficiencies via Dev-Ops. More recently, his specific passions lie in automated deployment and distributed systems, with the goal of building clouds deployments that self-sustain, heal and upgrade. Prior to ShiftLeft, Horacio was a Software Engineer at Canonical, and before that, he led a cutting edge IT services consulting business. Horacio is based in Argentina, where studied Software Development and Database Design at Universidad Tecnológica Nacional.

horacio-duran has 2 posts and counting.See all posts by horacio-duran