Android download analysis using U-SQL

When a developer obtains the reports of an Android App, it comes as a set of CSV files. “Downloads” file contains information related to the number of downloads for a Device Id. The brand and model of the id are declared in a different CSV file. It makes it harder for a developer to know the number of downloads per brand and model. Similarly, there are other situations when a developer (or a non-developer) is presented with raw CSV data and it needs to be transformed, queried, sorted, filtered and may even need to be merged with the other data. Tools like excel provide some of the basic capabilities such as sort and filter. But it does not support merging and advanced processing of the data. Power BI can query data over different sources and display it in the desired sorted order with filters. But it is more of a visual tool rather than a tool to merge data.

In such cases, developers “had” no choice but to import the CSVs in a database and process data there and export the processed data.

U-SQL provides an elegant solution to address this. Although it is intended for use with Azure Data Lake, it can be used for processing CSVs in the local environment. It provides built-in Extractors to extract data from CSVs, TSVs and store it in a row set with developer-defined names. This data can be processed using U-SQL. U-SQL has SQL like syntax. The processed data can be sorted, filtered, merged with other data set. U-SQL also provides built-in Outputters that can export this data to CSV or TSV format.

If one wants to create a file that shows downloads for brand and model, the steps are outlined here.

0. Initialize
Initialize the locations of the input and output files. These will be relative to the DataRoot set in the Azure Data Lake options.

Options

0

1. Extract
Extract the data from the input files i.e. installs and supported devices files. This data is stored in the in-memory Row sets. (@Model and @Downloads)

1

2. Transform
Transforms the results to the desired shape using U-SQL. This is very similar to SQL.

2

3. Output
Output the results to output CSV file using Outputter.

3

These steps using U-SQL provides a flexible and easy to use mechanism to extract the desired data in no time.

Source can be found on the GitHub:

https://github.com/sameerkapps/U-SQL-Android-Download-Analysis

Advertisements

Published by: Sameer Khandekar

This blog is about my hobby projects. The hobby projects include Arduino, IOT, ElectricImp, Mobile Apps and anything that catches interest. The intention of the blog is to share the knowledge. To know more about me: https://sameerkapps.wordpress.com/about/

Categories UncategorizedLeave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s