Learning Machine Learning Part 3: Attacking Black Box Models

In the primary submit on this collection we lined a short background on machine studying, the Revoke-Obfuscation method for detecting obfuscated PowerShell scripts, and my efforts to enhance the dataset and fashions for detecting obfuscated PowerShell. We ended up with three fashions: a L2 (Ridge) regularized Logistic Regression, a LightGBM Classifier, and a Neural Network structure.
The second submit lined attacking these fashions from a white field perspective, i.e., the place we now have the whole thing of the skilled mannequin itself together with the enter options, mannequin structure, mannequin parameters/weights, and coaching information. I extremely suggest a minimum of skimming these first two posts earlier than continuing to make sure this all makes as a lot sense as potential.

In this submit we’re going to cowl the extra widespread, and tough, black field perspective. Here we solely know what options are being extracted from every pattern – even the structure will stay opaque to us.
I’ve been ready for months to make use of this meme.
Background
After studying what was positively tons of of pages of educational analysis on adversarial machine studying, I can safely say {that a} good chunk of the analysis has been from a white field perspective. Remember our definition of white field and black field assaults from the second submit on this collection:

A white field assault is one the place we all know the whole lot in regards to the deployed mannequin, e.g., inputs, mannequin structure, and particular mannequin internals like weights or coefficients.
A black field assault is one the place we solely know the mannequin’s inputs, and have an oracle we are able to question for output labels or confidence scores. An “oracle” is a generally used time period on this area that simply means we now have some form of an opaque endpoint we submit our inputs to that then returns the mannequin output(s).

Also, a lot of the analysis seems to have been within the realm of picture recognition. While that is fascinating, it’s positively a unique drawback area than what we’re coping with. Specifically, pictures can have a number of pixels perturbed by a small quantity with out the ensuing adversarial picture showing to be modified to the human eye. For loads of the issues we’re coping with in safety, for instance our PowerShell obfuscation drawback area, we’re extra restricted in a) the variety of options we are able to modify and b) to what diploma we are able to modify stated options. That is, we now have a smaller purposeful subspace of modifications we are able to make to PowerShell scripts versus pictures.
Various black field assaults contain mannequin extraction (see the following part) to create an area mannequin, generally identified instead or surrogate mannequin. Existing assaults are then executed towards the native mannequin to generate adversarial samples with the hope that these samples additionally evade the goal mannequin. This typically works due to the phenomenon of assault transferability, which we’ll discuss shortly.
Black field assaults may also skip mannequin extraction and straight question inputs towards the goal mannequin. These assaults, the place the inner configuration of the mannequin shouldn’t be wanted in any respect, are what’s truly often known as black field assaults within the educational literature. However, by utilizing mannequin extraction we are able to probably apply white field assaults towards native clones of black field fashions the place we solely have an oracle to submit inputs to and get labels from.
Model Extraction
Model extraction, in accordance with Will Pearce and others, is without doubt one of the most elementary primitives in adversarial ML. While this concept was doubtless round for some time, I consider the primary formalizations of mannequin extraction (or a minimum of one which popularized the strategy) have been the 2016 paper Transferability in Machine Learning: from Phenomena to Black-Box Attacks utilizing Adversarial Samples” and the 2017 paper “Practical black field Attacks towards Machine Learning” each from Papernot et al. The normal abstract of their method from the 2017 paper is:
Our assault technique consists in coaching an area mannequin to substitute for the goal DNN [Deep Neural Network], utilizing inputs synthetically generated by an adversary and labeled by the goal DNN. We use the native substitute to craft adversarial examples, and discover that they’re misclassified by the focused DNN.
The whole thought is to approximate the goal mannequin’s choice boundary with much less (and normally completely different) information than the mannequin was initially skilled on. Basically, mannequin extraction includes first submitting quite a few identified labeled samples to the mannequin, which features as a labeling oracle. Imagine submitting a bunch of binaries to some form of web site that permits you to know whether or not the binaries are malicious or not. Or think about having our tailored Revoke-Obfuscation fashions as some form of inside API, the place we are able to submit our function measurements and get a label results of regular or obfuscated, or a probability-of-obfuscation rating. With sufficient inputs, we are able to prepare an area substitute mannequin that features equally to the goal mannequin.
Figure 1 from “Active Deep Learning Attacks underneath Strict Rate Limitations for Online API Calls” by Shi et al. summarizes the method nicely:
Figure 1 from https://arxiv.org/pdf/1811.01811.pdf
An inexpensive speculation is that the nearer we are able to match the unique mannequin structure, the higher our native mannequin will perform. This is one thing we’ll be exploring on this submit.
A barely completely different method includes coaching an initially poor mannequin with few samples, and utilizing a number of the white field assault strategies described within the second submit to generate adversarial samples. These samples are run by way of the classifier, and as described by this submit:
…the adversarial examples are a step within the route of the mannequin’s gradient to find out if the black field mannequin will classify the brand new information factors the identical method because the substitute mannequin. The augmented information is labeled by the black field mannequin and used to coach a greater substitute mannequin. Just just like the little one, the substitute mannequin will get a extra exact understanding of the place the black field mannequin’s choice boundary is.
End outcome both method? We have a regionally skilled mannequin that approximates the goal mannequin’s choice boundary. With this, we are able to carry out varied white field primarily based assault algorithms that exploit inside mannequin gradients, along with any black field assaults as nicely.
Sidenote: Inputs and Model Architectures
If the inputs to the mannequin you’re attacking are pictures or textual content, in some methods you’re in luck as you’ll be able to doubtless guess the goal mannequin base structure. There are established pointers for a majority of these inputs i.e., Convolutional Neural Networks for pictures and LSTM/Transformers (or Naive Bayes in particular circumstances) for textual content. In these examples, we’re going to work with tabular information, that means information that’s displayed in columns or tables. We’ll hopefully revisit attacking text-based fashions one other time!
Attack Transferability
You is perhaps asking, “Really? Attacks towards crappy regionally cloned fashions can work towards actual manufacturing fashions?” The reply is YES, attributable to a phenomenon referred to as assault transferability. The 2019 paper “Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks” by Demontis et al. explores this from an instructional perspective, however I’ll do my greatest to clarify the idea. Also, contemplating that this paper is only some years previous and there’s not a normal consensus as to why adversarial assaults switch, do not forget that that is nonetheless considerably of an open query.
The seminal work that launched this idea is the beforehand talked about 2016 paper “Transferability in Machine Learning: from Phenomena to Black-Box Attacks utilizing Adversarial Samples” by Papernot, McDaniel, and Goodfellow. The first few sentences from the summary give a superb overview of the idea (emphasis mine):
Many machine studying fashions are susceptible to adversarial examples: inputs which might be specifically crafted to trigger a machine studying mannequin to supply an incorrect output. Adversarial examples that have an effect on one mannequin typically have an effect on one other mannequin, even when the 2 fashions have completely different architectures or have been skilled on completely different coaching units, as long as each fashions have been skilled to carry out the identical job. An attacker could due to this fact prepare their very own substitute mannequin, craft adversarial examples towards the substitute, and switch them to a sufferer mannequin, with little or no details about the sufferer.
Their paper units out to show two speculation, specifically that “Both intra-technique and cross-technique adversarial pattern transferabilities are constantly robust phenomena throughout the area of machine studying strategies’’ and that “Black-box assaults are potential in sensible settings towards any unknown machine studying classifier.” Their paper makes a compelling case for every, and in addition demonstrates the transferability of various mannequin lessons, summarized by determine 3 on web page 5 of the paper:
Figure 3 from web page 5 in https://arxiv.org/pdf/1605.07277.pdf
The values in every cell are the % of samples (MNIST pictures right here, the de facto check case for adversarial assaults) crafted to evade a selected mannequin structure, that when utilized to a different mannequin structure additionally modified their classification label. That is, the share of profitable regionally crafted adversarial samples that additionally idiot the goal mannequin. Note that this determine doesn’t embrace Random Forests or Boosted Decision Tree ensembles (the Ens column is a customized ensemble of the 5 current strategies). The substitute mannequin sort is on the left aspect, and the mannequin sort being focused is on the underside. We can see some patterns:

In normal, the nearer you match the structure, the higher the evasion is more likely to be.
Logistic Regression fashions (LR) make a superb substitute mannequin for different Logistic Regressions, Support Vector Machines (SVM), and Decision Trees.
Decision Trees are probably the most susceptible to assaults, with assaults from each structure transferring nicely.
The most resilient structure is the Deep Neural Network (DNN).

From this, my concept is that in the event you can broadly match the goal mannequin’s structure you could have a greater likelihood that your assaults towards the substitute will switch.
How will this maintain up towards our instance datasets?
Attacking the Black Box
Our objective right here is to recreate an area, substitute mannequin with solely gaining access to the mannequin as a labeling oracle (i.e., the skilled goal fashions from the primary submit). Then we’ll execute the white field assaults from the second submit towards our substitute, hoping for ample assault transferability. While that is similar to the processes within the first two posts, we now have a few additional steps.
First, we’d like a brand new dataset to make use of for the mannequin extraction. I chosen 1500 random recordsdata from the PowerShellCorpus and ran every by way of a random set of obfuscations from Invoke-Obfuscation which gave me 3000 complete samples. I then ran the function extraction code towards every script and generated the BlackBox.ast.csv file that’s now up to date in ./datasets/ within the Invoke-Evasion repository.
The subsequent step is mannequin extraction, the place we prepare a surrogate native mannequin on the dataset labeled by the goal mannequin. To obtain this I used every goal mannequin to generate a respective set of labels for the brand new dataset. While these labels will not be the precise reality, as none of our fashions have been excellent, the labels mirror the choice boundary of the goal mannequin itself. I break up the dataset into a normal prepare/check with an 80/20 ratio, like we did within the first submit.
Model extraction in meme kind.
In the earlier part I discussed that the higher you match your native mannequin to the goal mannequin’s structure, the upper the chances are high that your crafted assault will idiot the goal. I wished to see what “mannequin reconnaissance” steps may assist make clear the goal mannequin’s structure. The massive query in my thoughts is figuring out if the goal is linear, tree-based, a Neural Network, or some third possibility. Tree-based algorithms typically work extraordinarily nicely with just about any dataset, so my speculation is that Random Forests and Gradient Boosted Trees will match nicely towards every goal mannequin dataset. Because of this, we ideally need to decide whether or not the mannequin is probably going a Neural Network or linear first, with tree-based the results of the method of elimination.
This is certainly an open query and one thing I don’t assume has been closely thought-about in academia. However I do need to reiterate once more that I’m not an professional here- if there’s current work on this space (or anybody has extra concepts) please let me know!
My two primary concepts that I’ll element shortly are:

Training a number of substitute fashions for every goal labeled dataset, producing adversarial samples utilizing the HopSkipJump assault from the Adversarial Robustness Toolbox. I’ll element this extra on this in a following part, however simply know for now that it is a technique to generate adversarial samples for any black field mannequin.
Testing the heavy modification of a single essential function towards the goal to see if the mannequin is perhaps linear.

I began with becoming the next fashions on every goal dataset, doing a primary cross-validated random search (utilizing RandomizedSearchCV) for widespread hyperparameters for the shallow algorithms (i.e., the whole lot however the Neural Network):

Logistic Regression
Support Vector Classifier with the radial foundation perform (rbf) kernel
Gaussian Naive Bayes
Random Forest
Boosted Trees (XGBoost)
One layer Neural Network with dropout (primary structure, no hyperparameter tuning)

I then used HopSkipJump to generate adversarial samples for every mannequin. For some purpose, I wasn’t capable of get the Neural Network to correctly generate sufficient samples utilizing HopSkipJump so I used the Fast Gradient Method (FGM) assault as a substitute. Why select these particular architectures because the native fashions? I wished to pick a variety of issues truly utilized in manufacturing, and wished protection of linear (Logistic Regression), tree ensembles (Random Forests/Boosted Trees), and Neural Networks.
Memes Memes Memes
These adversarial samples have been run towards every native mannequin and the goal mannequin to get comparable accuracies. However, extra importantly, I pulled out the particular samples misclassified by every native mannequin. These samples have been run towards the goal mannequin, giving a the share of adversarial samples crafted towards the native mannequin that additionally fooled the goal mannequin. This is the transferability thought we talked about earlier. While how typically efficient the entire native adversarial samples have been towards the goal is an fascinating information level, what we actually care about is how efficient the native surrogate mannequin is at crafting adversarial samples that idiot the goal mannequin.
Next, I took the very best performing logistic regression mannequin, which is linear, and closely modified a single coefficient for a pattern to see if this affected the mannequin output. The objective right here was to see if the mannequin was probably linear for an additional level of reference.
Target Model1 (Logistic Regression)
Here’s the results of our evasion-transferability search course of towards the primary mannequin:
Attack transferability for the tuned Logistic Regression mannequin.
These outcomes are fascinating. The mannequin that almost all intently matched the goal mannequin’s labeled information was the Random Forest, the Logistic Regression surrogate (which matched the goal mannequin’s structure most intently) had probably the most variety of adversarial samples transferred to the goal mannequin, however XGBoost had the best share of adversarial samples that transferred to the Logistic Regression goal.
Target Model2 (LightGBM)
Here’s the results of our evasion-transferability search course of towards the second mannequin:
Attack transferability for the tuned LightGBM mannequin.
These outcomes are additionally fascinating. The fashions that almost all intently matched the goal mannequin’s labeled information have been the Random Forest and XGBoost, whereas the Neural Network had each probably the most variety of adversarial assaults that transferred in addition to the very best assault transferability share to the LightGBM goal. Compared to the opposite two goal fashions, Model2 seems to be probably the most resilient towards adversarial samples as nicely.
Target Model3 (4 Layer Neural Network)
Here’s the results of our evasion-transferability search course of towards the primary mannequin:
Attack transferability for the Neural Network.
These outcomes are simply as fascinating because the earlier two. Most fashions have been extraordinarily shut in accuracy on the goal mannequin’s labeled information, whereas the Logistic Regression and XGBoost had probably the most variety of adversarial samples transferred to the goal Neural Network. The Neural Network had the very best assault transferability share, however the second worst variety of assaults transferred.
Based on the outcomes, my technique to find out mannequin structure by measuring surrogate mannequin assault transferability doesn’t seem to be it’s working too nicely on this case. As a affirmation, for every mannequin I took probably the most promising surrogate for every goal and carried out the identical assaults that we did towards the white field fashions within the second submit. This didn’t find yourself being that efficient, and I ended up with a considerable amount of modifications for every script. Specifically, the modified scripts would rating very low in obfuscation likelihood towards the native surrogate however would nonetheless be categorized as obfuscated by the goal.
I racked my mind for an evidence for this, and Will was capable of assist make clear a number of issues for me. This resulted in two main shifts in my method: delicate labels and flipping my perspective of the complete drawback on its head.
Attacking the Black Box Take 2 — Soft Labels
Will particulars the impact of onerous vs delicate labels on mannequin extraction on slide 21 from his 2021 BlackHat EU presentation:
Slide 21 from “Zen and the Art of Machine Learning”
Intuitively, this is sensible. A tough label like regular or obfuscated gives much less info than a set of chances like obfuscated=0.73. The extra info we now have, the higher we are able to match the internals of the model- there’s loads of area between onerous binary labels like 0 and 1!
Let’s take into account the state of affairs the place our magic Invoke-Obfuscation APIs give a likelihood of obfuscation like 0.73 as a substitute of only a binary 0/1 label. How can we reap the benefits of this extra info? Remember again to the primary submit the place we talked about classification versus regression: classification offers us labels, whereas regression offers us a quantity!
Let’s see what occurs if we label our adversarial dataset with the likelihood the script is obfuscated, as a substitute of simply the onerous regular/obfuscated label. We’ll hold a threshold of .5 or better to imply the pattern is obfuscated, however this may be adjusted (and sometimes is to vary the steadiness of false positives and false negatives).
Since we’re doing regression as a substitute of classification, we’d like a barely completely different set of algorithms. Neural Networks, Random Forests, Gradient Boosted Trees, and Support Vector Machines all have regression equivalents. However as a substitute of Logistic Regression (complicated identify for a classifier right here, I do know) we’re going to make use of the traditional Linear Regression, Lasso Regression (L1), Ridge Regression (L2), and BayesianRidge Regression as a substitute of Bayesian Naive Bayes. Then for every, we’ll take a look at the Root Mean Squared Error (RMSE) on the check set, a typical regression metric that squares the distinction between every prediction and its precise worth, provides up the entire squares, and takes the sq. root of the complete:
Ref — https://towardsdatascience.com/what-does-rmse-really-mean-806b65f2e48e
We’ll additionally take every regression mannequin, make a prediction for every pattern, and switch these into onerous labels once more by seeing that are equal to .5 or above. This will allow us to get an accuracy measurement. This complete factor is form of a classifier-approximating-regressors sort method. The particulars are within the BlackBox.ipynb pocket book within the Invoke-Evasion repository.
Meme break.
One problem I bumped into is that since we’re constructing regression fashions as a substitute of classification fashions, we are able to’t use them out of the field with HopSkipJump or different assault algorithms. I made some makes an attempt at rolling a customized classification mannequin that wrapped the regression scikit-learn mode, however ART nonetheless didn’t work with them correctly. I’m certain there’s a method to do that however there’s nonetheless a serious problem we haven’t thought-about but…
Attacking the Black Box Take 3 — The Real Problem
An enormous problem I encountered whereas attempting to wrap my head across the adversarial machine studying state of affairs right here is the best way to flip the modified adversarial numerical dataset again right into a working script. Like I’ve talked about over this submit collection, most educational adversarial machine studying analysis has been involved with picture information. Adversarial function perturbation for pictures is fairly simple, or relatively unconstrained — we simply tweak the bits for pixels after which submit the picture to our goal classifier. Pretty a lot all adversarial machine studying evasion algorithms are used this manner. You provide an array of knowledge samples and an assault algorithm and perturbed/adversarial samples are produced, i.e., arrays of numbers that idiot the goal mannequin when processed.
We explored the potential for function masking varied assault algorithms within the second submit on attacking white-box fashions. Even although we constrained the modifications to a smaller subset of more-easily-modifiable options, this nonetheless isn’t splendid. If we now have an array of options modified from an authentic pattern, how will we a) flip this again right into a script that b) runs in any respect and c) nonetheless executes the script’s meant performance?
Like we talked about within the second submit, what’s the ??? course of within the following determine:
Adversarial ML for PowerShell script obfuscation, take 1.
I used to be having problem wrapping my head round this till I learn a number of the supply for the mlsecevasion department of Counterfit and had one other dialog with Will that fully modified my perspective. He relayed a key perception from Hyrum Anderson: that is all actually simply an optimization drawback!
Black field ML assault algorithms are optimizing the measured enter options for the max-min adversarial drawback we talked about within the second submit, the place we need to maximize the error perform/lack of the mannequin for a pattern however reduce the variety of modifications to take action. Instead of optimizing the modification of the vectorized options straight, why don’t we optimize quite a few sequential actions that have an effect on these options?
“It’s all so apparent now Will! /s” — you, the reader
Basically, we need to first enumerate quite a few modification actions we are able to run towards a selected pattern that change the options extracted for stated pattern. For instance, Counterfit has a set of PE part provides/overlays, imports so as to add, and timestamps to strive for PEs. For our state of affairs, we’d need one thing that provides “regular” options, and we are able to use the explainability approaches from the second submit to information this course of. Then we are able to use an algorithm like HopSkipJump to discover a mixture of these options that produces the outcome we need.
Our method will as a substitute appear to be this:
Adversarial ML for PowerShell script obfuscation, take 2.
In Counterfit’s case, they’re taking a vector that represents modifications to carry out and feeding this into one thing like HopSkipJump. For some future work I hope to construct a Counterfit module for this PowerShell obfuscation, however we’ll hold issues a bit extra guide for now.
There’s one other different to utilizing assault algorithms, Bayesian Optimization, “a sequential design technique for world optimization of black-box features.” Remember within the first submit once we briefly talked about Optuna, a framework that was constructed for tuning hyperparameters for machine studying algorithms. Optuna implements varied Bayesian optimization strategies and is tremendous versatile, with a normal method of:

Define an goal perform. This perform has the search area for the parameter mixtures you’re searching for and returns a number of goal values that must be minimized or maximized.
Conduct quite a few trials as a part of a research, the place you restrict the variety of trials or complete operating time of the research.
Let the mathematics do the remaining, producing an optimized set of parameters for our perform!

The different key perception right here is that we don’t have to make use of Optuna for simply hyperparameter tuning! The goal perform represents a black field, and we are able to use it to execute quite a few modification actions to carry out towards our goal script and return the obfuscation likelihood rating from our goal mannequin. This is an method that Counterfit carried out as nicely, utilizing the Hyperopt framework as a substitute of Optuna.
If this seems like reinforcement studying, you’re not fully off base. Not being as literate within the related math as I would really like, I’ll state that I consider that this method is just like reinforcement studying in precept however completely different in observe. My intestine feeling is that this method will doubtless work for smaller units of atomic modifications that don’t rely as a lot on order — bigger numbers of sequential actions which might be very context dependent will doubtless carry out higher with a correct reinforcement studying method. However since we now have a brief suggestions loop of choose transforms-obfuscate script-measure obfuscation, we doubtless don’t want the elevated overhead of a whole reinforcement studying method.
Optimizing Obfuscation Evasion With Our Black Box
We’re going to place the whole lot collectively right here, combining the classifier-approximating-regressors method with Bayesian optimization of script modifications.
Our Overall Goal: take the obfuscated DBOdemo1.ps1 enter script and return a modified script that preserves the unique script’s performance whereas being flagged as regular by our tuned Neural Network mannequin. We additionally need to add as few characters to the script as potential.
Final meme, I promise.
Our full method from begin to end will be:

Generate an affordable variety of obfuscated and regular samples (the constructive and adverse lessons) and extract the vectorized options from every pattern. In our case these are the 446 “most essential” options chosen by way of Borta-Shap within the first submit.
Run the options for the vectorized samples by way of the goal mannequin that returns the likelihood of obfuscation. This produces a dataset labeled by the goal “oracle” mannequin.
Train quite a few native regression fashions to suit the likelihood scores of the coaching dataset as intently as potential, measuring ultimate efficiency towards a hold-out check set. This step and the final represent the Model Extraction a part of the assault chain.
Extract function importances from the very best performing mannequin to provide us probably the most impactful options.
Examine options the place the imply of the function worth is larger for regular samples than obfuscated samples. This is as a result of we need to add regular trying code to an current obfuscated pattern.
Using this perception about probably the most regular options, manually assemble a number of “script transforms” that add extra regular script information to the highest of an obfuscated script. We need these remodel features to incorporate insights from the function importances, but in addition protect the unique performance of the code. Essentially we wish this code to be like script NOPs however ones that affect the mannequin most in the direction of the regular.
Build an goal perform that takes an enter script and applies quite a few script transforms, returning the variety of characters added and the obfuscation likelihood rating from the goal mannequin as a substitute of the native surrogate mannequin.
Run an Optuna optimizer research that minimizes each of the values returned by the target perform for a multi-objective optimization.

The primary perception right here is that we’re extracting the goal mannequin as successfully as we are able to, and utilizing the white field insights from that native surrogate mannequin to information the development of features that modify the obfuscated goal script. Then we optimize the usage of these modification features towards the goal black field mannequin. This method lets us reduce the time and variety of queries towards the black field API.
The outcomes of our optimization research that ran for half-hour appear to be this:
The Optuna optimization run.
And the ultimate optimized script seems like this:
The ultimate script optimized by Optuna.
Within half-hour we ended up with an optimized script that elevated the unique script measurement by solely 23.06%, but had an obfuscation rating from the goal mannequin of 35.24, nicely under the 50 threshold for Normal.
The ultimate query in my thoughts is “Does this adversarial pattern switch to the opposite two goal fashions?” Let’s see:
Checking the transferability of the adversarial pattern.
This optimized pattern was profitable towards goal mannequin 3, the Neural Network, in addition to mannequin 1, the Logistic Regression. However it didn’t switch to the LightGBM boosted tree-ensemble (mannequin 2). This is probably going as a result of we:

Built our function transforms from the native surrogate mannequin tuned for mannequin 3’s chances
Specifically optimized in the direction of the choice boundary for mannequin 3
Tree ensembles can typically be harder to evade

Observations and Final Thoughts
Evading linear fashions, like Logistic Regression, is simple. Evading tree-ensembles, like Random Forests or Gradient Boosted Decision Trees, or Neural Networks, is a little more tough. Evading black field variations of those fashions is even more durable, or a minimum of takes extra time and effort.
Most assaults in literature contain producing a number of adversarial samples and testing how efficient these are towards a goal mannequin, like we did for our three goal fashions. There isn’t as a lot work on the market that I do know of (I promised I attempted to look!) that includes sensible real-world black field assaults tabular information like examples right here. Depending on the goal mannequin structure, and whether or not the mannequin is white field or black field, evading for a single pattern can have various ranges of problem.
The discipline of adversarial machine studying is lower than a decade previous, with the primary formalized assaults being launched round 2014. Most of the work to date has been educational, and has closely targeted on white field assaults towards picture classifiers. While sensible frameworks just like the Adversarial Robustness Toolbox do exist, a lot of the assault algorithms have restrictions that make them both not relevant or not fascinating to our safety assault eventualities. Specifically, not all assault algorithms can be utilized on tabular/non-image information, and most don’t allow you to restrict which options are modified (often known as function masking).
From the adversarial assault algorithm aspect, the massive perception Will relayed to me is that that is all simply an optimization drawback! Information safety, like many industries, is commonly ignorant to the advances of different fields that would immensely assist us. My private instance of this was when Andy and I have been engaged on the issue/device that finally grew to become the unique BloodHound graph method – we saved stumbling round the issue till Andy was discussing our challenges together with his pal Sam, who lamented, “Dude, how have you ever guys not heard about graph concept?”
The drawback of “how will we take these adversarial numbers and remodel them again to a usable assault” is an actual problem in executing these assaults virtually. However, if we alter how we take into consideration the issue, we are able to use approaches influenced by the Counterfit framework, or the framework itself, to optimize adversarial actions.
My hope is that the tutorial adversarial machine studying neighborhood continues to make extra progress on sensible adversarial analysis past white field adversarial assaults on picture classifiers (like this!). Real world issues are tougher than solely taking part in with MNIST, and there are loads of probabilities for excellent collaboration with safety trade professionals to deal with a few of these sensible eventualities. There are a ton of non-image-focused gradient boosted tree fashions deployed in the actual world as black containers: how can we go about successfully attacking them whereas minimizing our variety of queries?
Also do not forget that to have just about any hope of an adversarial assault working towards a black field mannequin, it is advisable to know the enter options! With pictures, that is apparent, however for actual world techniques this will get extra sophisticated and will require reverse engineering to know function extraction mechanisms.
And lastly, do not forget that ML fashions are a “dwelling resolution”- as Lee Holmes acknowledged, “The one factor to bear in mind with ML fashions or signatures is that they’re by no means “performed”. If you’re not retraining primarily based on false positives and false negatives, you’ve misplaced.” For brevity, I passed over loads of the real-world considerations for mannequin deployment and upkeep. The rising subfield of MLOps offers with loads of these points, and I plan to revisit the practicality of implementing the fashions we’ve mentioned all through this collection as I be taught extra about this rising self-discipline.
Epilogue
But what about defenses??!!?
Yes, I do know I do know, please don’t @ me that I’m an irresponsible crimson teamer. I’ll draft a follow-up submit that displays on a number of the defenses round this drawback area, nevertheless I’m nonetheless attempting to get my footing for issues like distillation and adversarial coaching. But I’ll depart you with an perception that adversarial ML godfather Ian Goodfellow acknowledged round 4 years in the past:
It’s additionally essential to emphasise that the entire defenses to this point are primarily based on unrealistically simple menace fashions the place the attacker could be very restricted…as a analysis drawback it’s been actually onerous to unravel even the restricted model and this stays a really lively analysis space with loads of essential challenges to clear up.
For future work on this particular space, listed below are just a few normal targets I’m hoping to pursue:

Examine text-based obfuscation detection as a substitute of the AST method described on this collection. This impacts the feasibility of deployment in some eventualities.
Test the effectiveness of adversarial ML defenses towards these particular fashions.
Run this kind of white/black field case research on one other dataset for comparability.
Dive extra into attacking tree ensembles.
???

Now that we’re on the finish of this firehose of knowledge unfold over three posts, I hope you loved studying this materials as a lot as I loved researching and writing it. This has been an extended however rewarding journey for me, and if this has sparked your curiosity on this discipline I say bounce in! You can be part of us within the DEF CON AI Village Discord, the#DeepThought channel on the BloodHound Slack, or be happy to e-mail me at will [at] harmj0y.internet.
References

Learning Machine Learning Part 3: Attacking Black Box Models was initially printed in Posts By SpecterOps Team Members on Medium, the place individuals are persevering with the dialog by highlighting and responding to this story.
*** This is a Security Bloggers Network syndicated weblog from Posts By SpecterOps Team Members – Medium authored by Will Schroeder. Read the unique submit at: https://posts.specterops.io/learning-machine-learning-part-3-attacking-black-box-models-3efffc256909?source=rss—-f05f8696e3cc—4

https://securityboulevard.com/2022/05/learning-machine-learning-part-3-attacking-black-box-models/