Mango: A No-SQL Schema Inference Engine

Reading Time: 8 min

Chirag Chauhan

Senior Tech Lead

Jan 30, 2013 |

Posted in Development

Recently, my team had to work on a clustered web-based project that had a No-SQL backend; MongoDB to be specific. Having come from a SQL era, this was the first big No-SQL project that I had to deal with, and there were many things that I learned in the process. One of the outcomes of this was Mango, a schema inference engine, which I will describe later below.

But first,

What is a No-SQL database?

There is no precise definition of the term “No-SQL” database. It is used to describe a number of recent database engines, that break away from the SQL mould, and have some common characteristics like:

Scalability is the primary objective
ACID compliance is not a necessity
No pre-specified or enforced Schema for tables / collections
Nested documents
Typically, there’s no support for joins or they are very slow.

Because of the above reasons, full support for SQL queries is not possible, and hence the term “No SQL”

Challenges

While working with such a NoSQL database, I found that there the above charactereistics bring about new features and along with them new challenges, like:

There are no transactions in the database, so if you need them (occasionally), you need to code that logic in the application.
Since documents can be nested, queries are easier to write.
It is easy to mix documents with different properties in a collection. For example, if only few users in the database need addresses, you can specify the address field for only those users’ documents.

However, because of the latter two points (nested documents and varying properties), over time, it becomes difficult to know what is the overall schema of the collection.

To address this problem, my team has been working on a new project, which tries to infer the schema of an existing No-SQL database. We have name the tool Mango.

Mango features at a glance

Data exploration
Schema inference
Relation inference

Using Mango

When a user first runs Mango, they are presented with connection options (currently, it only supports MongoDB on localhost), and after approving them, Mango connects to the database backend and presents a list of databases that are available to explore.

After selecting the database, the user can choose from a list of collections in that database.

After choosing a collection, the inference engine starts reading and processing the entries in that collection. It does a recursive analysis of each field in each document in the collection, and then infers the schema from this.

If more than one collection is chosen, Mango will also try to infer the relationships between the collection. This information can be quite complex and hence it is presented in a graph.

Illustration 3: How a graph showing relations looks like

For those who would like to view the inference results as a static document, we have implemented an HTML report generator as well.

Developing Mango

Our natural choice for developing Mango was Scala, because it is fast and portable and expressing complex algorithms in it is easier, thanks to its exhaustive collection libraries, and functional programming features.

For example, the core inference engine is just about 25 lines of code! We first defined a function that merges any two given schemas and then to merge all schemas in a collection we folded over the sequence of rows like this:

val inferredSchema =
collectionSchema.rowFields
.foldLeft[Seq[Field]](Nil)(mergeSchemas)
.sortBy(-_.count)

Here we are folding over the schemas by merging them and then sorting them based on their repeat count. Four lines of code which would have taken reams of code in an imperative styled language!

Of course, writing those few lines of code requires some time and expertise, but it pays off in the long term, in time spent on testing, debugging and maintenance.

Scala will also enable us to easily perform background tasks, to free the GUI thread. We intend to use Actors to implement multi-threaded analysis algorithms.

We have been using good design patterns while developing Mango, such as immutable constructs and layered classes, and we have been writing unit tests as well. This has led to a robust application.

Future plans

Support for more databases and more f

Stay Updated

Flutter 3 Overview: Desktop, Mobile, Web Updates

05/10/2022
Accelerate Digital Transformation with LCNC App Development

08/09/2022
7 Trends Affecting Mobile App Development in 2022

18/07/2022
Developing Mobile Applications for Android 13

07/07/2022
Software Development: Why You Need the Right Engagement Model

25/01/2022

The Internet: Then and Now

It has been a big week in tech. Not only did we have the Google I/O keynote yesterday (last week’s blog covered that in more detail), but also Kleiner Perkins Caufield &...

Anand Rohit

Jun 1, 2015

Is Your Healthcare App Safe Enough? Are You Sure?

Seemingly, every company from sporting goods manufacturers to medical insurance providers to smartphone makers are getting into the heathcare app game. Healthcare apps are...

Sachin Kalra

Feb 10, 2015

Apexon Powered Mobile Testing

Mobility is the ‘new normal’ now. It has been largely adopted by enterprises across all domains - banks, hospitals, hotels, retail and travel chains. Enterprises have matured...

Sandeep Dhar

May 5, 2014