You wouldn't download an AI

Altay Akkus

Dec 27, 2024

Extracting AI models from mobile apps

Read →

12 Comments

Abhishek

Jan 6

This is pretty cool! I wonder how common is it now that these big companies have some pathway to enter into their code.

This article felt like going on an adventure!

Expand full comment

Reply (1)

Altay Akkus

Jan 6

If you are shipping a mobile app, Android or iOS, a pathway to _their_ code is inevitable since its running on _your_ device :)

The newest iOS versions are harder to jailbreak, but I have iOS 15.8 running with root & Frida just fine :)

Expand full comment

John Smith

Jan 5

Could you explain to a layman what'd be the further use of such a model contained in a .tflite file? Would it be possible to edit it somehow? Say, to enrich the list of currency bills it recognizes and somehow pack that file back into the .apk so the actual app would now work with a bigger list of bills it could now recognize?

But the initial .apk has probably been signed, rehashing it by injecting it with your own versions of internal files would result into an .apk whose signature wouldn't fit it anymore...

Expand full comment

Reply (2)

Altay Akkus

Jan 5

Oh sorry, forgot the second one:

If you now had a model which recognizes more bills, you would have to adjust the application logic and label list in the Android application.

The signature would be invalidated, but you can sign it on your own and install it on your device, or publish it somewhere. But in general: Extracting an APK, rebuilding it into an APK and running with apktool is a messy job, and more often than not it fails somewhere.

But Tensorflow Lite has several open source projects, which use TFLite to infer a model using a camera. I would just advise you to build your own app for it.

https://developer.android.com/codelabs/digit-classifier-tflite#0

Expand full comment

Altay Akkus

Jan 5

I think there are a lot more people which are far more knowledgable in neural nets and TensorFlow, but:

The TFLite essentially describes the model, so how many input layers, neurons between, output layers etc., and also the trained weights and biases.

Regarding adding more bills: I think this will be very hard, because the model right now has a certain amount of “outputs”. But as I said, my knowledge about neural nets is not bigger than your’s when you watch the videos from 3Blue1Brown. Would be insanely interesting tho, to work on a “compiled” neural net :)

Expand full comment

Sok Puppette

Jan 5

> Keep in mind that AI models, like most things, are considered intellectual property. Before using or modifying any extracted models, you need the explicit permission of their owner.

This is not legally established *anywhere*, and is on shakey legal ground *everywhere*. Copyright covers works of authorship, which ML models are not. Please don't go around giving legal advice when you have no clue.

Expand full comment

Reply (2)

Altay Akkus

Jan 5

You are probably right that there has been no big ruling over the fact if weights and biases are considered IP. For the hypothetical scenario that someone would leak OpenAIs GPT-4o or something, I would think that the lawmakers would not allow every other company to just use it without any repercussions.

But shakey ground is 100% correct, my intention was only to state that I do not directly instruct people to break the law - if they do.

My home country, Germany, has really archaic laws regarding “hacking” and stuff, so this ain’t legal advice but just “dont sue me please” notices :)

Expand full comment

Mike

Jan 5

ML models included in a copyrighted work such as an application are copyrighted with the application. Period. You can't say this part of the application is not copyrighted.

You don't know how they trained that data, but if you use it, it's possible that they have explicitly trained that data to recognize something non-monetary to be able to detect copycat uses. If they did this - like training their model with a picture of a fake bill with the founder's head and saying it is a $100,000 bill. If they put that in the model, and then you use their model under a service, they have a means to detect your copyright violation, and their suit will likely hold up in a court of law.

It can cost millions of dollars to train these models. You can 100% bet they have invested something to be able to detect copycat uses and sue. They invested some amount of work to encrypt this model that you circumvent by tweaking the device API to dump the secret. It's likely not the only protection they have.

Expand full comment

Reply (1)

Altay Akkus

Jan 6

IANAL but I think it is questionable if this is subject to copyright laws, even if you wrote it is.

But regarding the easter eggs: I agree, there are probably some shenanigans like that going on, but I think this could be lead to problems in the training. There was some research on CV models, they ran FFT on the weights and biases, which suggested that the "learned" formula is essentially a pipeline of Kalman filters feeding into each other. Feeding your model a bunch of pictures of a USD note, but also slipping in something entirely different may, or certainly will, mess with your overall accuracy of your model.

Would be awesome if you could try this one out with some generic model from Kaggle, I can't afford to get nerdsniped again :D

But reminded my of Ghost towns, non-existent towns from map providers which provide evidence when you steal map data from them.

Expand full comment

John

Jan 7

Thanks for the insightful post. Do you know of a way to better encrypt a model so it will be harder (hopefully impossible) to break into?

Expand full comment

tbwm2

Jan 5

Would be great if you could give an example for how one could use this tflite file. How was is this from running `tensorflow currencies.tflite < image.jpg` and getting "100USD" ?

Expand full comment

Reply (1)

Altay Akkus

Jan 6

Essentially that, but with extra steps :)

Check out this Python script, it loads the model and the labels from file and then infers them using TFLite. It's basically loading the model into TF, scaling down the pictures to the required dimensions, inferring the model with the pictures, and then match the output to a label.

https://github.com/JerryKurata/TFlite-object-detection/blob/main/TFLite_detection_image.py

Expand full comment

Altay's Blog

You wouldn't download an AI