Transform Image Data into Insights with VisualInsight’s AI Automation

Extracting insights from images can often feel challenging. Whether you’re a researcher, an analyst, or simply curious, efficiently analyzing and understanding images is crucial but not always straightforward. This is where VisualInsight comes in.

Figure 1: Gemini model

GitHub - yotambraun/VisualInsight Contribute to yotambraun/VisualInsight development by creating an account on GitHub. github.com

Challenges with Traditional Image Analysis Methods

Manual Effort: Finding the right tools, writing custom scripts, and working with large datasets often involves significant manual work.
Complexity: Navigating advanced algorithms, ML frameworks, or open-source projects can be overwhelming, especially for smaller teams.
Storage and Security: Ensuring data is securely stored and easily retrievable adds another layer of complexity.
Scaling: Handling larger datasets requires scalable infrastructure, which often involves high overhead.

VisualInsight addresses these challenges with a seamless and automated solution for image analysis.

Figure 2: Example of the user interface where you can upload images As you can see, the UI helps to simplify the process. You just drag and drop your image — no complicated scripts required.

Introducing VisualInsight

Core Idea

VisualInsight is a Streamlit-based web application that simplifies image analysis using Google Generative AI (Gemini). It incorporates AWS S3 for secure storage of original images and results.

Figure 3: Analysis results displayed in the Streamlit application

By automating much of the heavy lifting, VisualInsight ensures you spend less time on configuration and more time on innovation. Key Components

Streamlit UI: A user-friendly interface for uploading, viewing, and analyzing images.
LLM Service (Google Gemini): Advanced text-based insights derived from images.
AWS S3 Storage: Secure storage for files and AI-generated analyses.
Docker & Terraform: Infrastructure for quick deployments and reproducibility.
CI/CD via GitHub Actions: Automated builds, tests, and deployments for reliability.

Architecture:

How VisualInsight Works

Upload an Image Drag and drop a JPG or PNG file onto the application.
AI Analysis with Google Gemini The uploaded image is then passed to the LLMService class, which uses Google’s Generative AI (Gemini) to generate descriptive insights about the image content.

Figure 4: Further analysis details being displayed to the user

3. Storage in AWS S3 Once analyzed, the application uploads both the original image and any analysis results to an S3 bucket for safe-keeping.

4. Display Results Insights are displayed in the application interface for immediate feedback.

Figure 5: Another view of the analysis interface

Code Highlights Below are some of the core services that power VisualInsight.

LLM Service (app/services/llm_service.py)

Handles the interaction with Google Gemini for image analysis.

import google.generativeai as genai
import os
from datetime import datetime
from PIL import Image
from utils.logger import setup_logger

logger = setup_logger()

class LLMService:
    def __init__(self):
        genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
        self.model = genai.GenerativeModel('gemini-1.5-flash-002')
        
        self.prompt = """
        Analyze this Image and provide:
          1. Image type
          2. Key information
          3. Important details
          4. Notable observations
        """

    def analyze_document(self, image: Image.Image) -> dict:
        try:
            logger.info("Sending request to LLM")
            # Generate content directly with the PIL image
            response = self.model.generate_content([
                self.prompt, 
                image
            ])
            
            return {
                "analysis": response.text,
                "timestamp": datetime.now().isoformat()
            }
            
        except Exception as e:
            logger.error(f"LLM analysis failed: {str(e)}")
            raise Exception(f"Failed to analyze document: {str(e)}")

What’s Happening Here?

I configure our Google Generative AI (Gemini) with an API key.
A default prompt outlines the kind of analysis we want.
The analyze_document method sends the image to Gemini and returns its text-based analysis.

2. S3 Service (app/services/s3_service.py) Uploads files to AWS S3 with timestamped keys and generates presigned URLs for private access.

import boto3
import os
from datetime import datetime
from utils.logger import setup_logger

logger = setup_logger()

class S3Service:
    def __init__(self):
        self.s3_client = boto3.client(
            's3',
            aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
            aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
            region_name=os.getenv('AWS_REGION', 'us-east-1')
        )
        self.bucket_name = os.getenv('S3_BUCKET_NAME')

    def upload_file(self, file):
        """Upload file to S3 and return the URL"""
        try:
            # Generate unique filename
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            file_key = f"uploads/{timestamp}_{file.name}"
            
            # Upload to S3
            self.s3_client.upload_fileobj(
                file,
                self.bucket_name,
                file_key
            )
            
            # Generate presigned URL that expires in 1 hour
            url = self.s3_client.generate_presigned_url(
                'get_object',
                Params={
                    'Bucket': self.bucket_name,
                    'Key': file_key
                },
                ExpiresIn=3600
            )
            
            logger.info(f"File uploaded successfully: {url}")
            return url
            
        except Exception as e:
            logger.error(f"S3 upload failed: {str(e)}")
            raise Exception(f"Failed to upload file to S3: {str(e)}")

Figure 6: The AWS S3 bucket that stores uploaded images and analysis results Core Features:

Uses boto3 to interact with AWS S3.
Generates a time-stamped key for each file.
Creates a presigned URL for private file access without requiring you to open up the entire bucket.

3. The Streamlit Application (app/main.py) Provides the user interface for file uploads, analysis initiation, and displaying results.

import streamlit as st
import os
from dotenv import load_dotenv
from services.s3_service import S3Service
from services.llm_service import LLMService
from utils.logger import setup_logger
from PIL import Image

# Load environment variables
load_dotenv()

# Setup logging
logger = setup_logger()

# Initialize services
s3_service = S3Service()
llm_service = LLMService()

def main():
    st.title("Document Analyzer")
    
    uploaded_file = st.file_uploader("Upload a document", type=['png', 'jpg', 'jpeg'])
    
    if uploaded_file:
        # Display image
        image = Image.open(uploaded_file)
        st.image(image, caption='Uploaded Document', use_column_width=True)
        
        if st.button('Analyze Document'):
            with st.spinner('Processing...'):
                try:
                    # Analyze with LLM directly
                    logger.info("Starting document analysis")
                    analysis = llm_service.analyze_document(image)
                    
                    # Upload to S3 for storage
                    logger.info(f"Uploading file: {uploaded_file.name}")
                    s3_url = s3_service.upload_file(uploaded_file)
                    
                    # Display results
                    st.success("Analysis Complete!")
                    st.json(analysis)
                    
                except Exception as e:
                    logger.error(f"Error processing document: {str(e)}")
                    st.error(f"Error: {str(e)}")

if __name__ == "__main__":
    main()

Streamlit handles the UI: file upload, display, button triggers.
LLMService and S3Service are orchestrated together to handle the AI query and file upload.
Real-time logs inform you of the status and highlight any issues.

Running VisualInsight Locally

Clone the Repository

git clone https://github.com/yotambraun/VisualInsight.git
cd VisualInsight

2. Environment Setup

Create a .env file at the project root:

AWS_ACCESS_KEY_ID=YOUR_AWS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET
AWS_REGION=us-east-1
S3_BUCKET_NAME=YOUR_BUCKET_NAME
GOOGLE_API_KEY=YOUR_GOOGLE_GENAI_KEY

3. Install Dependencies

pip install -r requirements.txt

4. Run the App

streamlit run app/main.py

Navigate to http://localhost:8501 in your browser to start using VisualInsight!

Containerization with Docker Use Docker for consistent application performance across environments.

Figure 7: AWS ECS used for container orchestration

Dockerfile (excerpt):

FROM python:3.9-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy application code
COPY app/ .

EXPOSE 8501

ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]

Steps:

Build and Run Locally:

docker build -t visualinsight:latest .

Run:

docker run -p 8501:8501 visualinsight:latest

Visit http://localhost:8501 to use the app.

Infrastructure as Code with Terraform

Figure 8: AWS ECR, storing Docker images for the application I use Terraform to create and manage AWS resources: S3, ECR, ECS, and more for deploying the application.

Why Terraform? Terraform allows you to define your cloud infrastructure as code. Rather than manually creating AWS resources via the console or CLI, you simply write a configuration file. This ensures that your infrastructure is consistent, version-controlled, and easily replicable across multiple environments.

Key Advantages of Using Terraform:

Reproducibility: The same configurations can be deployed multiple times without drift.
Collaboration: Teams can review Terraform files in Git, allowing for better code reviews and fewer mistakes.
Scalability: Quick spin-up of additional resources if your usage grows.
Example Variables (infrastructure/terraform/variables.tf)

variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "bucket_name" {
  description = "Name of the S3 bucket"
  type        = string
}

2. Main Configuration (infrastructure/terraform/main.tf)

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

resource "aws_s3_bucket" "documents" {
  bucket = var.bucket_name
}

resource "aws_ecr_repository" "app" {
  name = "document-analyzer"
}

resource "aws_ecs_cluster" "main" {
  name = "document-analyzer-cluster"
}
# ... ECS Service, Security Groups, Task Definition, etc.

Why ECR and ECS?

Amazon ECR (Elastic Container Registry): A private registry for storing your Docker images. Instead of relying on Docker Hub or other third parties, ECR keeps your images secure within your AWS account.
Amazon ECS (Elastic Container Service): An AWS-native container orchestration service. It manages the scaling and deployment of your containerized application automatically. With Fargate (serverless compute engine for containers), you don’t have to worry about provisioning or managing EC2 instances; it abstracts away all the heavy lifting.

In Short:

ECR stores your built Docker images.
ECS pulls those images from ECR and runs them as containers in a scalable manner.

3. Deploying via Terraform

cd infrastructure/terraform
terraform init
terraform plan -var="bucket_name=my-visualinsight-bucket"
terraform apply -var="bucket_name=my-visualinsight-bucket"

Terraform will:

Create an S3 bucket.
Create an ECR repository.
Set up an ECS cluster, tasks, services, IAM roles, and more.

Automated CI/CD with GitHub Actions Automate the build, test, and deployment process to ensure consistent updates.

Your .github/workflows/deploy.yml takes care of:

AWS Login: Authenticates with your AWS account using secrets.
Docker Build & Push: Builds the Docker image and pushes it to Amazon ECR.
ECS Update: Forces a new deployment on ECS to pull the latest image.

Figure 9: GitHub Actions

Figure 10: GitHub Actions pipeline for CI/CD

Sample Deploy Workflow:

name: Deploy to AWS

on:
  push:
    branches: [ main ]

jobs:
  deploy:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v2

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v1
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1

    - name: Login to Amazon ECR
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v1

    - name: Build and push Docker image
      env:
        ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
        ECR_REPOSITORY: document-analyzer
        IMAGE_TAG: ${{ github.sha }}
      run: |
        docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
        docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG

    - name: Deploy to ECS
      run: |
        aws ecs update-service --cluster document-analyzer-cluster --service document-analyzer --force-new-deployment

Here’s a breakdown of what each step does and why it’s essential: Workflow Trigger

on:
  push:
    branches: [ main ]

What It Does: The workflow is triggered whenever there’s a push to the main branch of the GitHub repository. This ensures that deployment happens only when stable, production-ready code is pushed.
Why It’s Important: Keeps the deployment pipeline consistent with the latest updates to the main codebase, avoiding manual deployment efforts.

Job Definition

jobs:
  deploy:
    runs-on: ubuntu-latest

What It Does: Defines a job named deploy that runs on the latest Ubuntu environment.
Why It’s Important: Ensures a consistent and reliable operating system environment for running the deployment steps.

Step 1: Check Out the Code

   - uses: actions/checkout@v2

What It Does: Clones the repository code into the runner.
Why It’s Important: Provides access to the most recent changes in the codebase, which are necessary for building and deploying the application.

Step 2: Configure AWS Credentials

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v1
      with:
        aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
        aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        aws-region: us-east-1

What It Does: Authenticates the GitHub runner with AWS using securely stored credentials.
Why It’s Important: Allows the runner to interact with AWS services like ECR and ECS while ensuring security through encrypted credentials.

Step 3: Login to Amazon ECR

    - name: Login to Amazon ECR
      id: login-ecr
      uses: aws-actions/amazon-ecr-login@v1

What It Does: Logs into the Amazon Elastic Container Registry (ECR) to manage Docker images.
Why It’s Important: Provides access to the private ECR repository where Docker images are stored.

Step 4: Build and Push Docker Image

    - name: Build and push Docker image
      env:
        ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
        ECR_REPOSITORY: document-analyzer
        IMAGE_TAG: ${{ github.sha }}
      run: |
        docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
        docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG

What It Does:
Builds the Docker image from the application code.
Tags the image with the unique GitHub commit hash (github.sha) for versioning.
Pushes the image to Amazon ECR.
Why It’s Important: Ensures the application is containerized, version-controlled, and ready for deployment. Using ECR ensures secure and scalable image storage.

Step 5: Deploy to ECS

    - name: Deploy to ECS
      run: |
        aws ecs update-service --cluster document-analyzer-cluster --service document-analyzer --force-new-deployment

What It Does: Updates the ECS service to use the newly pushed Docker image.
Why It’s Important: Automatically deploys the latest version of the application to the containerized environment, ensuring the live service is always up-to-date.

Whenever you push to main, GitHub Actions will build and deploy your latest changes automatically.

Real-World Impact

Time Efficiency With AI-driven analysis, there’s no need for manual labeling or advanced ML pipeline setup.
Scalability AWS S3 + ECS means you can handle ever-growing image datasets and traffic without re-architecting.
Reliability Docker ensures consistent environments; Terraform standardizes infrastructure, and GitHub Actions automates testing and deployment.
User-Friendly Streamlit’s intuitive UI means non-developers can upload images and see insights in real time.

Conclusion

VisualInsight takes the guesswork out of image analysis. By combining Streamlit, Google Generative AI (Gemini), AWS S3, Terraform and CI/CD, it delivers a robust, scalable solution that’s easy to use and maintain. VisualInsight streamlines the entire workflow — so you can focus on making discoveries, not wrestling with infrastructure.

Key Takeaways

Automation reduces manual work and simplifies processes.
Infrastructure as Code promotes collaboration and reproducibility.
Docker ensures consistency across development and production environments.
CI/CD enables fast and reliable updates.

Feel free to clone the GitHub Repository (https://github.com/your-username/VisualInsight) and customize it for your own project needs. If you enjoyed this, consider clapping on Medium, sharing with others, or following me for more deep dives into AI and cloud solutions! Thanks for Reading!

References

Google Gemini: Google’s advanced AI model designed for multimodal data processing, including text, images, and audio.

Streamlit: An open-source app framework for creating and sharing data applications using Python.

AWS S3: Amazon Simple Storage Service (S3) is an object storage service offering scalability, data availability, security, and performance.

Docker: A platform for developing, shipping, and running applications inside containers, ensuring consistency across multiple development and release cycles.

Terraform: An open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure.

GitHub Actions: A CI/CD platform that allows you to automate your build, test, and deployment pipeline.

AWS ECR (Elastic Container Registry): A fully managed container registry that makes it easy for developers to store, manage, and deploy Docker container images.

AWS ECS (Elastic Container Service): A highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS. These references provide detailed information about each component used in the VisualInsight application.

Read the full article here: https://pub.towardsai.net/visualinsight-an-end-to-end-image-analysis-application-using-google-generative-ai-gemini-7847f6f0a1b9