If the static analysis suffers from limitations mainly due to the opacification of certain parts of the content of an application, the dynamic analysis can circumvent this limitation. High expertise is required to be implemented and to obtain essential information for a good understanding of its operation.
After seeing the static analysis of mobile applications at the beginning of the month, let’s now look at the analysis ofdynamic, a method used to analyze the functioning of a part of a program in full execution. This type of study provides valuable information about the package executed because we can observe the interactions it generates within its execution environment. In this context, two areas of execution should be delimited: the close environment : dalvik area [nom couramment donné à l’espace virtuel où une application mobile s’exécute au sein d’Android] and internal element of the application. And also the remote environment (Android environment outside the Dalvik area and third parties).
In the case of the study of a mobile application, this type of analysis makes it possible to circumvent the problems of opacifications because it makes it possible to install listening mechanisms on precise elements of the application.application. The analyst will be able to retrieve a set of information passing through the program during its operation. However, the difficulty will lie in the precise location of the targeted data. As we will see more late, some elements can be duplicated or clouded within the program, and within this framework it will be necessary to determine the moment where this one is decryptede clear.
Here is a non-exhaustive list of the types of information that can be collected using this method:
– Data transfers on the network (http request);
– File writes and reads;
– Function calls;
– Arguments and function returns.
The analysis procedure
To put it in work, a researcher or an analyst will define the type and location precise of the target. For instance, we’ll be able to target returns R of a method M related to data transfers from a form F contained in an activity HAS. Usually carried out via a component of type Array Where list Where Mapif the application is written in Kotlin, the analyst will be able to listen to this component in definedss a listen function (hook) which will make it possible to bring up the structure and the content of the said component. Usually, we store then this information in a database for further analysis.
VScontrary to what we couldthink, a localized listening is not enough. The analysts generally carry out a series of listenings covering a field of investigation more or less wide. This multiplication makes it possible to collect and compare certain values betweenthem in order to have a precise view of a process.
Let’s take our previous example to illustrate these last remarks. In the case where the method manages sensitive information (means of payment, authentication data, etc.) for a malicious purpose, an informed analyst will not seek to confirm a result known in advance. This one will rather try to analyze the functioning because the malicious system will do everything to make this activity undetectable via various means (encryption, anonymization, cutting…). For this reason, it will be interesting for the practitioner to carry out a precise statement of the state of the values as they are processed by the system in order to identify the operating modehear implemented in the application.
We then understand that dynamic analysis has an exploratory dimension and requires a precise definition of his Goals.
The biases of dynamic analysis
This type of analysis, although robust to attempts at opacification, nevertheless requires a detailed definition of its objectives. This point is crucial because it explains why it is difficult to fully automate it. Although this last point must be nuanced as we will see in the next part, it explains that its use is more delicate.
But this is not the only cause of this state of affairs. Dynamic analysis also requires a degree of expertise notnegligible on the part of the practitioner. It is easy to obtain data, but tricky to verify where it comes from. Each data extracted by this process must be locatable in the calls via a comparison with the call graph (call graph). In case of repackaging (modification or alteration of the origin of a call), some completely legitimate system libraries, can undergo a modification in order to add or alter a functionality.
Let us pause for two minutes on this last point. Let’s take the example of the library supporting components of type chains of characters: java.Lang.String. Often used in applications, cellethis can be diverted in order to endow the “String” component with a behavior, malicious or not, not originally intended. Although this library is legitimately called by the application, via a overload it is quite possible for a developer to extend its functionalities. This flip side of the coin originates in the type of component used. A string is an “Object” and therefore has a series of attributes and methods that ensure its operation. Come there, if we limit the analysis to reading the value of the component, we miss crucial information, the alteration of its appeal. We then easily understand that the implementation of such a tool will require the analyst not only to understand and to know the main types of components present in the application and the operating system but also the mpossible alteration mechanisms.
Another bias of this type of analysis lies in the circumscription of its field of investigation. In fact, this type of analysis requiring the execution of the application, it is difficult or even impossible to carry out a general listening of all the components present in it.
For example, a language learning application that I use daily has no less than 1528 classes and more than 35000 methods. A global listening would be unstable and would represent a stream of consequent data. It is therefore necessary to restrict its scope. The analysis should therefore first identify the areas on which she has to concentrate.
Finally, the processing times for this type of analysis are long and expensive. Often, these must be reiterated because they are likely to stop suddenly, especially when the practitioner uses an interaction generator such as ” MonkeyRunner » present in Google’s ADB suite. This last point reinforces the idea of targeting prior to the study.
The role of machine learning
Previously, we mentioned the fact that this type of analysis was tricky to automate, but not impossible. If the analyst proceeds with an exploratory approach (he does not yet have a very precise idea of his target), then the automation will be complex and the analysis may miss out oncrucial information.
However, dn the case where the practitioner has an idea precise of the target, regardless of the application analysed, then automation is possible. This scenario is particularly true when designing a model for detecting programs malicious. If the designer focuses on a family or a malware subfamily (botnet, rootkit, etc.) then it will be possible for him to carry out a preliminary detection of this architecture in the application and then to make an in-depth analysis of it. However, to do this the analyst will use machine learning via the creation of a database containing relevant indicators vis-à-vis the type of malicious program audited in order to to train the algorithm that to proceed to to analyses. The algorithms used are varied (calculation of mean centers, vector support machine, decision tree, etc.), as are the indicators (nature or distribution of arguments, series of calls, origin of calls, etc.). This approach is particularly used in the case of a hybrid analysis in combiing a static and dynamic analysis.
This field of study, although recent in the academic and entrepreneurial world, presents convincing results. However, there is no perfect solution and detection systems are regularly challenged and improved.
We see that the dynamic analysis is more robust but also more difficult to implement. This can be automated in specific case studies but then requires a greater field of knowledge and know-how to be implemented. However, this type of analysis can be optimized in certain cases by combining it with a static analysis. We will explore this point further in our next article.