Skip to content

Terraform Ingester

The Terraform Ingester is a component ingest provider that parses Terraform configuration files to extract major software components and infrastructure dependencies directly from the authoritative source without hardcoded assumptions.

Overview

The Terraform ingester follows the IngestProvider interface and extracts components from Infrastructure as Code (IaC) configurations using generic pattern matching. Unlike traditional parsers that hardcode software names, this implementation discovers component names and versions directly from Terraform's native structures.

Architecture

The Terraform ingester consists of two main classes:

Metaport\Component\Ingest\Provider\Terraform\Ingester

The main ingester class that implements the IngestProvider interface. It:

  • Searches for common Terraform configuration files (main.tf, terraform.tf, versions.tf, providers.tf)
  • Parses each file using the Configuration class
  • Merges dependencies from multiple files, avoiding duplicates
  • Returns an array of Dependency objects

Metaport\Component\Ingest\Provider\Terraform\Configuration

The configuration parser that extracts components from HCL (HashiCorp Configuration Language) syntax using generic pattern matching. It identifies components through:

  • Provider sources: Extracts tool names from source = "namespace/toolname"
  • Container image references: Parses image = "name:version" patterns
  • Engine specifications: Finds engine = "name" with engine_version = "version"
  • Runtime specifications: Parses runtime = "language-version" patterns
  • Version indicators: Discovers any *_version = "version" patterns

Generic Pattern Recognition

The parser uses these generic patterns to discover components without hardcoding names and automatically normalizes component names:

Pattern 1: Provider Sources

required_providers {
  aws = {
    source  = "hashicorp/aws"    # Extracts "aws"
    version = "~> 5.0"           # Extracts "5.0"
  }
}

Pattern 2: Container Images

image = "nginx:1.25.3"           # Extracts "nginx" version "1.25.3"
image = "node:20-alpine"         # Extracts "nodejs" version "20" (normalized)
image = "python:3.12-slim"       # Extracts "python" version "3.12"

Pattern 3: Engine + Version Pairs

engine         = "mysql"         # Component name
engine_version = "8.0.35"        # Component version

Pattern 4: Runtime Specifications

runtime = "python3.11"           # Extracts "python" version "3.11" (normalized)
runtime = "nodejs18.x"           # Extracts "nodejs" version "18" (normalized)
runtime = "java11"               # Extracts "java" version "11" (normalized)

Pattern 5: Generic Version Indicators

mysql_version = "8.0"            # Extracts "mysql" version "8.0"
node_version = "18.17.0"         # Extracts "node" version "18.17.0"

Component Name Normalization

The parser automatically normalizes component names to ensure consistency:

  • Removes version numbers: nodejs18nodejs, python3python
  • Standardizes names: nodenodejs
  • Extracts numeric versions only: 8.3-fpm-alpine8.3, 20-alpine20

Examples of Normalization:

  • runtime = "python3.11" → Component: python, Version: 3.11
  • runtime = "nodejs18.x" → Component: nodejs, Version: 18
  • image = "node:20-alpine" → Component: nodejs, Version: 20
  • image = "php:8.3-fpm-alpine" → Component: php, Version: 8.3

No Hardcoded Mappings

This implementation deliberately avoids hardcoded software names or version mappings. Instead, it:

  • Extracts component names directly from Terraform configuration values
  • Uses the exact names as specified in the infrastructure code
  • Preserves version strings as written (including suffixes like -alpine, .x)
  • Relies on the existing Component model to match against known software

Usage

The Terraform ingester is automatically available when configured as an application's IMBackendName:

$app = Application::get()->byID(123);
$app->IMBackendName = 'Terraform';
$app->write();

// Get components from Terraform configurations
$components = $app->getIMService()->getComponents();

Configuration Files

The ingester looks for these files in the VCS repository:

  1. main.tf - Main configuration file
  2. terraform.tf - Terraform and provider requirements
  3. versions.tf - Version constraints
  4. providers.tf - Provider configurations

Example Terraform Configuration

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

resource "aws_db_instance" "main" {
  engine         = "mysql"
  engine_version = "8.0.35"
  instance_class = "db.t3.micro"
}

resource "aws_lambda_function" "api" {
  runtime = "python3.11"
}

resource "aws_ecs_task_definition" "app" {
  container_definitions = jsonencode([
    {
      name  = "web"
      image = "nginx:1.25.3"
    },
    {
      name  = "app"
      image = "php:8.3-fpm-alpine"
    }
  ])
}

This configuration extracts: - aws (version 5.0) - from provider source - mysql (version 8.0.35) - from engine + engine_version - python (version 3.11) - from runtime specification, normalized from "python3.11" - nodejs (version 18) - from runtime specification, normalized from "nodejs18.x" - nginx (version 1.25.3) - from container image - php (version 8.3) - from container image, numeric version extracted from "8.3-fpm-alpine"

Testing

Test fixtures are available in app/test/fixtures/metaportingest/1.0/terraform/ with example configurations demonstrating various component extraction patterns.

Run tests with:

docker compose -f docker-compose-dev.yml exec app vendor/bin/phpunit app/test/php/Component/Ingest/Provider/Terraform/

Benefits of Generic Approach

  • Future-proof: Automatically supports new software without code changes
  • Accurate: Uses exact names and versions from infrastructure definitions
  • Maintainable: No hardcoded mappings to maintain
  • Flexible: Works with any software that follows Terraform's naming conventions
  • Authoritative: Extracts data directly from the source of truth

Limitations

  • Parses HCL using regex patterns (not a full HCL parser)
  • Requires explicit version specifications in Terraform configurations
  • May miss dynamically generated resource configurations
  • Depends on consistent naming patterns in Terraform resources