Create tables in Athena for CloudTrail logs
To make it easier for you to create tables in Athena for CloudTrail logs, our Technical Support team have placed together this handy guide:
- CloudTrail Logs in Athena via Manual Partitioning
- CloudTrail Logs Table in Athena via Partition Projection
Create Tables For CloudTrail Logs In Athena Via Manual Partitioning
Consider the following example of manually creating tables for a CloudTrail using the Athena console.
- To begin, we’ll copy the DDL statement from the CloudTrail console’s Create a table in the Amazon Athena dialogue box. To partition the table, we’ll paste this DDL statement into the Athena console and add a “PARTITIONED BY” clause.
- Next, change the following code to point to the Amazon S3 bucket containing the log data:
s3://CloudTrail_bucket_name/AWSLogs/Account_ID/CloudTrail/
- Then we’ll double-check that all of the fields are listed correctly.
- The query will then be run through the Athena console.
- The partitions should then be loaded using the ALTER TABLE ADD PARTITION command.
For example,
ALTER TABLE table_name ADD
PARTITION (region='us-east-1',
year='2019',
month='02',
day='01')
LOCATION 's3://CloudTrail_bucket_name/AWSLogs/Account_ID/CloudTrail/us-east-1/2021/02/01/'
Create Tables For CloudTrail Logs In Athena Via Partition Projection
With the Athena partition projection feature, we can clarify the partition scheme in advance, decrease query run time, and automate partition management. Whenever new data is added, this feature creates new partitions.
Furthermore, it eliminates the need to manually create a partition using the ALTER TABLE ADD PARTITION command.
For example, in the example below, the CREATE TABLE statement automatically projects partitions in CloudTrail logs from a given date to the current date for a specific AWS Region. Here, we must replace the placeholders bucket, account-id, and AWS-region with accurate values in the LOCATION and storage.location.template.clauses.
Furthermore, we must replace 2020/01/01 with the preferred start date. We can query the table once the query has been successfully run. Moreover, the ALTER TABLE ADD PARTITION command is not required to load the partitions.
CREATE EXTERNAL TABLE cloudtrail_logs_pp(
eventVersion STRING,
userIdentity STRUCT<
type: STRING,
principalId: STRING,
arn: STRING,
accountId: STRING,
invokedBy: STRING,
accessKeyId: STRING,
userName: STRING,
sessionContext: STRUCT<
attributes: STRUCT<
mfaAuthenticated: STRING,
creationDate: STRING>,
sessionIssuer: STRUCT<
type: STRING,
principalId: STRING,
arn: STRING,
accountId: STRING,
userName: STRING>>>,
eventTime STRING,
eventSource STRING,
eventName STRING,
awsRegion STRING,
sourceIpAddress STRING,
userAgent STRING,
errorCode STRING,
errorMessage STRING,
requestParameters STRING,
responseElements STRING,
additionalEventData STRING,
requestId STRING,
eventId STRING,
readOnly STRING,
resources ARRAY<STRUCT<
arn: STRING,
accountId: STRING,
type: STRING>>,
eventType STRING,
apiVersion STRING,
recipientAccountId STRING,
serviceEventDetails STRING,
sharedEventID STRING,
vpcEndpointId STRING
)
PARTITIONED BY (
`timestamp` string)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://bucket/AWSLogs/account-id/CloudTrail/aws-region'
TBLPROPERTIES (
'projection.enabled'='true',
'projection.timestamp.format'='yyyy/MM/dd',
'projection.timestamp.interval'='1',
'projection.timestamp.interval.unit'='DAYS',
'projection.timestamp.range'='2020/01/01,NOW',
'projection.timestamp.type'='date',
'storage.location.template'='s3://bucket/AWSLogs/account-id/CloudTrail/aws-region/${timestamp}')
Looking for solution? We are here to help you. Contact Us.