How to ingest your Lacework data in Snowflake

Steve Lukose
6 min readMay 13, 2022
credit: Itsecuritywire

Lacework covers many cloud security use-cases and is one of the few vendors that allows you to easily export all of the underlying data for your analysis. This enables you to open up a slew of new use cases joining with your other operational data.

Note: Lacework does expose their data as a direct data-share, but only in AWS us-west-2. If you happen to be operating your snowflake cloud in this region you can take that option; however, you will be limited to only the last 90 days of data.

Prerequisites.

  • Admin access to Lacework
  • Role in snowflake with permissions to set up storage integrations, file formats, pipes, schemas, tables, and views. Optional: Streams/tasks.
  • AWS s3 bucket and permissions to set up an IAM role in Snowflake if not already configured.

Step 1: Configure your s3 bucket to receive raw data.

We will need a staging bucket to receive this data. You can do this in the AWS console, or if using terraform, setup something up like the below. You can have a pretty short lifecycle on the data since we will be copying it into Snowflake.

Terraform for setting up Lacework bucket

Step 2: Configure IAM role for Lacework to push into the bucket.

You likely already have a cross account role setup for Lacework to AWS for its normal operations; however, it's best to set up a new one specifically to write to this bucket.

In AWS IAM, Create Role → AWS Account → Another AWS Account. Fill in Lacework’s account as seen in the trust policy below. Fill in a secret for the external ID you will use in the next step.

The Trust policy will look like this:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::434813966438:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "YOUREXTERNALIDHERE"
}
}
}
]
}

Add a permissions policy to your role. In the console, select service s3, and give it “List Bucket” (Restrict to the bucket you created) and give it “put object.” The policy will look like the below.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "LaceworkListETL",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::BUCKETNAME"
},
{
"Sid": "LaceworkPutETL",
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": "arn:aws:s3:::BUCKETNAME/*"
}
]
}

Step 3: Setup the export channel.

In Lacework, As an Admin, Navigate to:

Settings → Alert Channels → Add New → Amazon s3

Can fill out the details of your integration or use the TF provider if that is how you configured Lacework. Use the role ARN, and External ID generated in step 1.

Step 4: Setup the data export.

In Lacework, As an Admin, Navigate to:

Settings → Configuration → Data Export Rules→ Create data export rule. Fill out the details and select the channel you created in step 2.

We now have data flowing into the bucket; you can optionally store it here for the long term, however, since we’re copying it into Snowflake for analysis, you will likely leave it as a short-term staging location and use Snowflake as your storage engine.

Step 5: Setup Snowflake to read from s3.

Ingesting from s3 into Snowflake is a widespread pattern and there are tons of articles and official documentation on how to set this up. The official docs are pretty easy to follow: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3.html

This article assumes that you are configuring the AWS IAM setup from the official docs.

Step 6: Creating staging tables and begin ingest.

In Snowflake, The easiest pattern for s3 ingest Uses Snowpipe with bucket event notifications.

In Snowflake: Create the schema and raw tables.

create schema SECURITY_DB.lacework;
use schema SECURITY_DB.lacework;
create or replace table lacework_raw_data(filename varchar, var variant);

In Step five you would have created a storage integration like so:

CREATE or Replace STORAGE INTEGRATION lacework
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::YOURACCOUNTNUMBER:role/YOUR-ROLE-NAME'
STORAGE_ALLOWED_LOCATIONS = ('s3://YOUR-BUCKET-NAME/');

We’ll use our storage integration to setup an external stage:

create or replace stage lacework_stage
url = 's3://YOUR-BUCKET-NAME'
storage_integration = lacework;
// You view that its created and setup by executing
show stages;

Now we have a stage set up and we can start ingesting all the data from it into our table.

create or replace pipe SECURITY_DB.LACEWORK.lacework_pipe auto_ingest=true as
copy into SECURITY_DB.lacework.lacework_raw_data
from (select METADATA$FILENAME, $1 from @SECURITY_DB.lacework.lacework_stage (file_format => JSON));
/* You will want view the pipe and grab the notification channel arn,you will add this as an event notification in s3. The details of that is explained in the documentation from Step 5. This is the queue snowpipe setups up, so it knows when new data is written to the bucket, it can go in and read it.
*/
SHOW PIPES;

Step 7: Test and Query some Data.

If all is setup correctly you should be able to query your raw table and see records in there.

select * from lacework_raw_data limit 10;
// Should get the results with a filename then your variant column which is just the raw json.

We have a couple of options for querying the data, the fastest way is to create views for the 20ish different schemas Lacework publishes.

The below SQL will create views out the the raw table and then we can immediately start exploring the data.

Once complete, You should see all of your views under the Lacework Schema.

Lets test this out and see what we find? Replace the X.X with the first two octets of a VPC CIDR you might be looking at. This will show us all the hosts your systems attempted to resolve that are part of your VPC. Could be interesting if there are 3rd party applications that are not part of your approved list.

SELECT COUNT(d.FQDN),m.hostname as "Source Host", d.FQDN as "Target FQDN" 
FROM SECURITY_DB.LACEWORK.DNS_QUERY_V d, SECURITY_DB.LACEWORK.MACHINE_SUMMARY_V m
WHERE m.mid = d.mid AND d.HOST_IP_ADDR NOT LIKE ('X.X.%') AND d.CREATED_TIME >= current_date - interval '10 days'
GROUP BY d.FQDN, m.hostname
ORDER BY COUNT(d.FQDN) DESC;

Step 8: (Optional) ELT process from raw table to permanent tables.

The views were fun and great at a very small scale as you can get going and explore the data quickly. However, once you have any volume in your raw table, query time will increase significantly.
Fortunately, this is pretty straightforward forward, and we can copy all this raw data into final tables and run our queries and processes off of those.

We’ll Create the tables:

Once we have our tables all set up, we can do a quick ELT process to copy all the existing data in there and immediately set up a stream object offset so we can capture any changes in each stream. We’ll then be able to schedule tasks to capture any changes and insert them into our tables.

Copy existing data and setup stream object:

Setup tasks to empty the stream objects for each type at a scheduled interval. Depending on your use case, you may use a very short interval (1 Minute) or if you are only doing analytics, an hour is likely sufficient.

Create tasks, to copy from the stream objects. Turn on the tasks:

Explore the data:

You can query these tables, just as you would the views. Just drop the _V on the end. These will be updated as frequently as you set your task interval.

The table definitions and what you can expect in each data point can be found here. https://docs.lacework.com/machinesummaryv-view

I will link to new articles on data exploration use cases in the future.

--

--