← Back to blog
9 min readApril 28, 2026

VPC Flow Logs: enable, query, and analyze AWS network traffic

Mohamed Aït El KamelBy Mohamed Aït El Kamel, Founder & AWS Solutions Architect
VPC Flow Logs: enable, query, and analyze AWS network traffic

What are VPC Flow Logs?

VPC Flow Logs capture metadata about IP traffic going to and from network interfaces in a VPC. They record accepted and rejected connections at the ENI, subnet, or VPC level — without capturing any packet payload. What you get is the traffic record: source IP, destination IP, ports, protocol, bytes transferred, and whether the traffic was allowed or denied.

Flow logs are the primary tool for answering network-level questions after the fact: why was this connection rejected, which instance is generating unexpected outbound traffic, is there anything trying to reach my database that shouldn't be?

They are separate from application logs and CloudWatch metrics. Flow logs operate at the network layer regardless of what software is running on the instances.

What flow logs capture — and what they don't

Captured:

  • EC2 instance traffic (all ENIs)
  • Elastic Load Balancer traffic
  • RDS and ElastiCache traffic
  • Traffic to and from VPC endpoints (interface and gateway)
  • NAT Gateway traffic
  • Transit Gateway traffic (separate flow log resource)

Not captured:

  • 169.254.169.254 (EC2 instance metadata service)
  • 169.254.169.123 (Amazon Time Sync Service)
  • DHCP traffic
  • DNS queries to the VPC resolver at x.x.x.2
  • Windows license activation traffic to KMS

If you are troubleshooting a connectivity issue and not finding relevant records, check whether the traffic falls into one of the above exclusions before assuming a logging configuration problem.

Default log format (v2) includes 14 fields: version, account-id, interface-id, srcaddr, dstaddr, srcport, dstport, protocol, packets, bytes, start, end, action, log-status.

Custom format (v5) adds fields that are far more useful for real diagnosis: vpc-id, subnet-id, instance-id, tcp-flags, pkt-srcaddr, pkt-dstaddr, flow-direction, and traffic-path. Always use a custom format for new flow log configurations — the default format omits context that you will almost always want when investigating something.

Enabling VPC Flow Logs

Flow logs can deliver to three destinations: S3, CloudWatch Logs, or Kinesis Data Firehose. The choice affects cost, query tooling, and retention flexibility.

To S3 (recommended for long-term analysis):

aws ec2 create-flow-logs \
  --resource-type VPC \
  --resource-ids vpc-xxxxxxxx \
  --traffic-type ALL \
  --log-destination-type s3 \
  --log-destination arn:aws:s3:::my-flow-logs-bucket/vpc-flow-logs/ \
  --log-format '${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status} ${vpc-id} ${subnet-id} ${instance-id} ${tcp-flags} ${flow-direction}'

The S3 destination requires a bucket policy that allows the delivery.logs.amazonaws.com service principal to write objects. AWS will display an error during flow log creation if the policy is missing.

To CloudWatch Logs (for real-time alerting):

aws ec2 create-flow-logs \
  --resource-type VPC \
  --resource-ids vpc-xxxxxxxx \
  --traffic-type ALL \
  --log-destination-type cloud-watch-logs \
  --log-group-name /aws/vpc/flowlogs \
  --deliver-logs-permission-arn arn:aws:iam::123456789012:role/FlowLogsDeliveryRole

The IAM role must allow logs:CreateLogGroup, logs:CreateLogStream, and logs:PutLogEvents for the target log group, with a trust relationship for vpc-flow-logs.amazonaws.com.

Terraform:

resource "aws_flow_log" "main" {
  vpc_id          = aws_vpc.main.id
  traffic_type    = "ALL"
  iam_role_arn    = aws_iam_role.flow_logs.arn
  log_destination = aws_cloudwatch_log_group.flow_logs.arn

  log_format = "${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status} ${vpc-id} ${subnet-id} ${instance-id} ${tcp-flags} ${flow-direction}"
}

Note the double-dollar signs in Terraform's HCL — single $ is interpreted as a variable reference.

Querying with Athena (S3 destination)

For S3 destinations, Athena is the standard query tool. The key to keeping Athena costs low is partition projection — it lets Athena skip scanning irrelevant partitions without requiring you to run MSCK REPAIR TABLE after each day's logs arrive.

CREATE EXTERNAL TABLE vpc_flow_logs (
  version        int,
  account_id     string,
  interface_id   string,
  srcaddr        string,
  dstaddr        string,
  srcport        int,
  dstport        int,
  protocol       bigint,
  packets        bigint,
  bytes          bigint,
  start          bigint,
  end            bigint,
  action         string,
  log_status     string,
  vpc_id         string,
  subnet_id      string,
  instance_id    string,
  tcp_flags      int,
  flow_direction string
)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE
LOCATION 's3://my-flow-logs-bucket/vpc-flow-logs/AWSLogs/123456789012/vpcflowlogs/us-east-1/'
TBLPROPERTIES (
  "projection.enabled"            = "true",
  "projection.year.type"          = "integer",
  "projection.year.range"         = "2024,2030",
  "projection.month.type"         = "integer",
  "projection.month.range"        = "1,12",
  "projection.month.digits"       = "2",
  "projection.day.type"           = "integer",
  "projection.day.range"          = "1,31",
  "projection.day.digits"         = "2",
  "storage.location.template"     = "s3://my-flow-logs-bucket/vpc-flow-logs/AWSLogs/123456789012/vpcflowlogs/us-east-1/${year}/${month}/${day}"
);

Useful queries once the table is set up:

Top 10 source IPs by bytes (bandwidth hogs):

SELECT srcaddr, SUM(bytes) AS total_bytes
FROM vpc_flow_logs
WHERE year = '2024' AND month = '04' AND day = '28'
GROUP BY srcaddr
ORDER BY total_bytes DESC
LIMIT 10;

All REJECT traffic to a specific instance:

SELECT srcaddr, srcport, dstport, protocol, bytes, start
FROM vpc_flow_logs
WHERE instance_id = 'i-0abc123def456'
  AND action = 'REJECT'
  AND year = '2024' AND month = '04'
ORDER BY start DESC;

Detect port scans (SYN without ACK — tcp_flags = 2):

SELECT srcaddr, COUNT(*) AS syn_count
FROM vpc_flow_logs
WHERE tcp_flags = 2
  AND action = 'REJECT'
  AND year = '2024' AND month = '04' AND day = '28'
GROUP BY srcaddr
HAVING COUNT(*) > 100
ORDER BY syn_count DESC;

TCP flag values are a bitmask: SYN = 2, ACK = 18 (SYN+ACK), FIN = 1, RST = 4. A high volume of SYN-only records from a single source with REJECT action is a reliable indicator of a port scan.

CloudWatch Logs Insights queries

For the CloudWatch Logs destination, Logs Insights is faster for recent data and has no per-query Athena cost. The trade-off is higher ingestion cost ($0.50/GB vs S3 direct).

All REJECT traffic sorted by bytes:

fields srcaddr, dstaddr, dstport, action, bytes
| filter action = "REJECT"
| stats sum(bytes) as total_bytes by srcaddr, dstaddr
| sort total_bytes desc
| limit 20

Top talkers on port 443:

fields srcaddr, dstaddr, bytes
| filter dstport = 443 and action = "ACCEPT"
| stats sum(bytes) as total_bytes by srcaddr
| sort total_bytes desc
| limit 10

Unexpected outbound connections (traffic going to external IPs):

fields srcaddr, dstaddr, dstport, bytes
| filter flow_direction = "egress"
  and not srcaddr like /^10\./
  and not srcaddr like /^172\.1[6-9]\./
  and not srcaddr like /^192\.168\./
| stats sum(bytes) as total_bytes by dstaddr
| sort total_bytes desc

Cost considerations

Flow log costs come from two sources: log delivery and storage/query.

DestinationDelivery costQuery cost
CloudWatch Logs$0.50/GB ingestedCloudWatch Logs Insights: $0.005/GB scanned
S3 direct$0.02–$0.09/GB (varies by region)Athena: $5/TB scanned

For most environments, S3 + Athena is significantly cheaper than CloudWatch Logs. A busy VPC generating 100 GB/month of flow logs costs roughly $5–9 to deliver to S3 vs $50 to CloudWatch Logs.

Practical cost controls for S3 + Athena:

  • Use partition projection (shown above) — eliminates full-table scans
  • Set S3 lifecycle rules to move logs to Glacier after 90 days and expire after a year
  • Enable S3 column-oriented format (Parquet) if you query frequently — dramatically reduces data scanned per query
  • Filter by traffic-type ALL only where needed; consider REJECT only for security-focused accounts

Use cases

Security monitoring: detect port scans (high-volume SYN-only traffic to REJECT), unexpected inbound connections to private subnets, and data exfiltration (anomalously high bytes to external IPs). Flow logs give you the evidence trail after an incident.

Compliance: demonstrate network isolation between environments. Auditors often ask whether production traffic can reach development environments — flow logs with no ACCEPT records between environment CIDRs are the answer.

Cost optimization: identify which instances are driving NAT Gateway data processing charges. A flow_direction = egress query grouped by instance_id reveals the specific workloads generating the most outbound traffic — often a handful of instances responsible for the majority of the bill.

Troubleshooting: determine whether a failed connection is a security group issue (REJECT with your instance as destination) or a routing issue (no record at all, meaning traffic never reached the ENI). The absence of a flow log record is itself diagnostic — it means traffic did not reach the network interface.

VizCon + Flow Logs: two layers of network insight

VizCon shows you the topology — which resources exist, how they are connected, what security group rules are configured. VPC Flow Logs show you the runtime traffic — what is actually flowing, in which direction, and whether it is being allowed or denied.

Used together: VizCon gives you the map, Flow Logs give you the movement. When you find a REJECT in your flow logs for a connection that should be allowed, open VizCon to immediately see the network path between source and destination, identify which security group is on the path, and pinpoint which rule is causing the block — rather than manually correlating interface IDs back to resources and clicking through security group rules in the console.

See how VizCon works in 10 minutes

Book a personalized demo and discover how VizCon visualizes your live AWS infrastructure.

Book a demo