Android download analysis using U-SQL

When a developer obtains the reports of an Android App, it comes as a set of CSV files. “Downloads” file contains information related to the number of downloads for a Device Id. The brand and model of the id are declared in a different CSV file. It makes it harder for a developer to know the number of downloads per brand and model. Similarly, there are other situations when a developer (or a non-developer) is presented with raw CSV data and it needs to be transformed, queried, sorted, filtered and may even need to be merged with the other data. Tools like excel provide some of the basic capabilities such as sort and filter. But it does not support merging and advanced processing of the data. Power BI can query data over different sources and display it in the desired sorted order with filters. But it is more of a visual tool rather than a tool to merge data.

In such cases, developers “had” no choice but to import the CSVs in a database and process data there and export the processed data.

U-SQL provides an elegant solution to address this. Although it is intended for use with Azure Data Lake, it can be used for processing CSVs in the local environment. It provides built-in Extractors to extract data from CSVs, TSVs and store it in a row set with developer-defined names. This data can be processed using U-SQL. U-SQL has SQL like syntax. The processed data can be sorted, filtered, merged with other data set. U-SQL also provides built-in Outputters that can export this data to CSV or TSV format.

If one wants to create a file that shows downloads for brand and model, the steps are outlined here.

0. Initialize
Initialize the locations of the input and output files. These will be relative to the DataRoot set in the Azure Data Lake options.

Options

0

1. Extract
Extract the data from the input files i.e. installs and supported devices files. This data is stored in the in-memory Row sets. (@Model and @Downloads)

1

2. Transform
Transforms the results to the desired shape using U-SQL. This is very similar to SQL.

2

3. Output
Output the results to output CSV file using Outputter.

3

These steps using U-SQL provides a flexible and easy to use mechanism to extract the desired data in no time.

Source can be found on the GitHub:

https://github.com/sameerkapps/U-SQL-Android-Download-Analysis

Published by: Sameer Khandekar

I am a passionate software engineer who loves to work on Azure microservices, REST API, SDKs, and .NET apps using WPF, Xamarin, and MAUI. The work includes highly scalable geo-distributed services and an authentication library with nearly 500 million downloads. I also had fun integrating with hardware using Bluetooth (BLE). More here: https://www.sameer.blog/about/

Categories UncategorizedLeave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s