When a developer obtains the reports of an Android App, it comes as a set of CSV files. “Downloads” file contains information related to the number of downloads for a Device Id. The brand and model of the id are declared in a different CSV file. It makes it harder for a developer to know the number of downloads per brand and model. Similarly, there are other situations when a developer (or a non-developer) is presented with raw CSV data and it needs to be transformed, queried, sorted, filtered and may even need to be merged with the other data. Tools like excel provide some of the basic capabilities such as sort and filter. But it does not support merging and advanced processing of the data. Power BI can query data over different sources and display it in the desired sorted order with filters. But it is more of a visual tool rather than a tool to merge data.
In such cases, developers “had” no choice but to import the CSVs in a database and process data there and export the processed data.
U-SQL provides an elegant solution to address this. Although it is intended for use with Azure Data Lake, it can be used for processing CSVs in the local environment. It provides built-in Extractors to extract data from CSVs, TSVs and store it in a row set with developer-defined names. This data can be processed using U-SQL. U-SQL has SQL like syntax. The processed data can be sorted, filtered, merged with other data set. U-SQL also provides built-in Outputters that can export this data to CSV or TSV format.
If one wants to create a file that shows downloads for brand and model, the steps are outlined here.
Initialize the locations of the input and output files. These will be relative to the DataRoot set in the Azure Data Lake options.
Extract the data from the input files i.e. installs and supported devices files. This data is stored in the in-memory Row sets. (@Model and @Downloads)
Transforms the results to the desired shape using U-SQL. This is very similar to SQL.
Output the results to output CSV file using Outputter.
These steps using U-SQL provides a flexible and easy to use mechanism to extract the desired data in no time.
Source can be found on the GitHub: