We have been working with various Enterprises for their Mobility solutions ranging from Hospitality to Technology businesses. We have learnt from our experiences that Enterprises generate enormous amount of data with their tools which they need to use to analyze, process, deliver and monetize.
This blog discusses about challenges which we have faced with MySQL as a data hub for Engineering Physics measurement tools which we had created for one of the set of mobile solutions and how did we quickly and successfully transplanted the Data Engineering from MySQL to NoSQL data technology like MongoDB.
Problem: The tools which we were working on were generating the resulting information per use in a fashion which could not allow the fixed schematron of the data from the initial point. The data generated per use could vary from a small set of columns with data size in KB to a large set of hundreds or thousands of columns with data size in hundreds of MBs.
For Suite of tools:
- Different tool configuration could generate resulting number data threads in a scaled out manner. Every optimization results may have additional new and different useful worker data threads which you cannot predict. Fixed set of column structure in this case was not a good data scale solution while you want to save the data.
- Every user of the tool would have its own set of defined configurations which could change over the period of set of usage runs. Dynamic scaling of number of configurations saved per software, and association (and isolation) of them per software run and monitoring them individually stored a good amount of information.
- The results generated and monitored for Operating System platform will be different from the ones generated for other platforms moving forward. Grouping of results, software and configurations in scaled up manner was not a feasible approach.
- Approaching these challenges with SQL based or file system based solutions was not viable as the performance on the calculations and graphical interpretation of data require more quicker responsiveness.
- The scale, amount and type of data we will have for new tools would require us to store them horizontally.
- We also had to think up front about the data which the new tools will generate in order to manage them in correct way.
Solution:
The solution needed to cater a humongous useful and non-useful data to be catered and managed in an efficient way so that the resulting information can be stored, managed and interpreted in seem less manner. The data responsiveness, dynamicity and magnitude of the data was a differentiation factor in order to move towards NoSQL data store technology from SQL data store technology. Our observation already shows significant improvement in the result interpretation performance in the solution due to faster data transmission and manageability. Hence scale out was a preferred way than scale up.
A sample example use case which we can take is the following:
- One user run generated a raw CSV of 148KB with only 24 data column worker threads. The tool could have ‘M’ number of data worker threads. The magnitude of ‘M” and ‘N’ can be anything as a machine can have any number of process threads running at a time and tool data capturing libraries do capture that.
- Considering a micro sub-set for resulting information the more you run the computation threads more you will get horizontal as well as vertical data.