Guides & Tips — 08 Sep 2022

Why Neural Networks Are Indispensable for Mobile Verification and How to Fit Them All Into a Smartphone App

It seems like nowadays artificial neural networks are responsible for almost everything: they search for the best restaurants in town, set plane ticket prices, translate texts into foreign languages, help drive cars, switch on that favorite song on a smart speaker, and complete all manner of other tasks. That’s why it’s no surprise that such an important sphere as identity verification couldn’t help but fall under artificial intelligence’s (AI) power. All major tasks, like document or biometric recognition, verification against specific parameters, liveness checks, and much more, are delivered by machine learning (ML) algorithms and numerous neural networks that are backed up by ML tools. And due to this, it takes seconds, instead of hours of tedious manual identification. 

Neural networks mimic the human brain, and just like the brain, their volume is predominantly huge. The majority of neural networks take megabytes of memory. This isn’t an issue when we speak about server solutions, as the capabilities of servers to store stuff are pretty impressive. But when it comes to mobile or embedded solutions, size matters. 

Imagine, you need to enable identity verification on a smartphone. Let’s say it will be a type of facial recognition technology. A user opens an app, takes a photo and—voilà—gets approval to access a product or service. Sounds easy, right? However, such speedy face detection, recognition and face quality verification requires around 20 neural networks (or more), as it includes multiple parameters to check and validate. And you are lucky if each of these networks weighs circa two megabytes. A simple multiplication shows that in this case you’ll need about 40 megabytes of a smartphone memory dedicated to store neural networks of a single app. Since an average user in the USA, for example, interacts with over 46 applications per month, such an approach is definitely unpromising.

Still, you know for sure that mobile identity verification is a real thing. Then, how do they manage to fit all those ML capacities into a limited smartphone storage space? The answer is in compressing neural networks. To get the point, you may compare it to archiving a large photo album. However, with neural networks the process is more complicated and the results differ depending on which compression techniques are used. But let’s start from the beginning.

Why Neural Networks Are Predominantly Big

To properly fulfill their functions, neural networks are being trained for some time. Just like in a human brain, the neurons that comprise a network signal to each other and process input data to provide the required output. For instance, to be able to prove someone’s identity via face matching, a neural network has to learn to differentiate millions of human faces. 

Exactly as it happens in the brain, after being trained, a neural network encompasses both required and redundant neural connections. The latter either aren’t used at all or duplicate active connections. And when the training is finally complete, all those redundant connections don’t vanish, they still remain somewhere deep inside the network and add to its overall size.

How to Cope With Excess and Lose Weight

Once a neural network is properly trained, here comes the next stage: finding the right balance between its efficiency and volume. It’s a crucial task as many systems where a neural network will be deployed have memory limits (and here we speak not only about mobile platforms, web solutions are memory-sensitive too). 

First things first: to get rid of all the superfluous connections that a network stores inside. To delete such parameters, developers employ pruning, a specific procedure that is widely used to cut off useless neural networks’ weights (parameters). Pruning allows you to reduce the size of a neural network and elevate its inference time: as biases are being removed and neurons deleted, a network starts to respond faster, taking less time for computing and data processing. In doing so, you get an accurate and storage savvy neural network that encompasses as much device memory as it needs just for correct functioning. 

When Does the Compression Begin?

As good as it sounds, pruning still is not the perfect tool to make a neural network light enough for universal usage on any given platform, including mobile ones. That is why we, at Regula stop pruning neural networks when their “capacity” and speed are the highest possible, regardless of the sizes they are of at that very moment. 

When this optimal balance between efficiency and volume is found, we go on to another stage: the quantization. Technically, it’s a process of reducing the precision of the network’s weights, like replacing floating-point numbers (like 0.3333…) with integers. The point is in decreasing the variation of weights in the network, which is generally pretty high, and making them more similar to each other. Such an approach allows for a more efficient compression. The logic derives from the archivation algorithms: the lower the variation of data in a file is, the larger the compression ratio is. For example, a text file containing only one repeating letter will be compressed with a higher ratio than a file containing various letters. The same applies to compressing neural networks.

To make the quantization even more effective, we at Regula group similar weights into clusters. This clusterization leads to a significant entropy reduction. As a result, a neural network can be effectively compressed even with an ordinary archiver. 

However, with quantization it’s relatively easy to lose accuracy of a network. So, in most cases, it’s a sort of trade-off between quality and size. But not for Regula.   

A Secret Ingredient

The AI industry knows that four- or five-fold compression (generally from 32 (or a bit less) to 8 bits per a weight) by means of quantization does not have a significant impact on the accuracy and quality of neural networks. But even a five times compressed neural network won’t fit a mobile solution, at least when it comes to identity verification. You remember those 20 networks required just for facial recognition, right?

In some cases this would be the end of the story, but Regula’s disruptors have invented in-house unique methods of quantization that make ten- and fourteen-fold compression achievable with no compromise on quality. For example, Regula’s neural network that works as a face detector can be compressed by 10-12 times. And the network for checking holograms’ texture can pass through a quantization process and become 12-14 times smaller in size.

And even this isn’t the limit. Research and innovation are a never-ending process at Regula, and, for some neural networks, the company’s developers and AI experts have managed to attain the degree of compression from 23 to 32 times. This means that every weight in a network—and there are usually millions of weights in every single neural network—can be reduced from 32 bits (the maximum) to 1 bit (the minimum). 

Thanks to such breakthroughs, Regula’s SDKs both for document and biometric verification can be easily embedded in any mobile solution regardless of how many neural networks they require for functioning. Moreover, new networks can be added immediately as new tasks arise due to the possibility of compressing them manifold without losing their accuracy or speed – and the impact on device or system memory will remain unnoticeable. 

Share this article