Transform Image Data into Insights with VisualInsight’s AI Automation
Extracting insights from images can often feel challenging. Whether you’re a researcher, an analyst, or simply curious, efficiently analyzing and understanding images is crucial but not always straightforward. This is where VisualInsight comes in.
Figure 1: Gemini model
GitHub - yotambraun/VisualInsight Contribute to yotambraun/VisualInsight development by creating an account on GitHub. github.com
Challenges with Traditional Image Analysis Methods
- Manual Effort: Finding the right tools, writing custom scripts, and working with large datasets often involves significant manual work.
- Complexity: Navigating advanced algorithms, ML frameworks, or open-source projects can be overwhelming, especially for smaller teams.
- Storage and Security: Ensuring data is securely stored and easily retrievable adds another layer of complexity.
- Scaling: Handling larger datasets requires scalable infrastructure, which often involves high overhead.
VisualInsight addresses these challenges with a seamless and automated solution for image analysis.
Figure 2: Example of the user interface where you can upload images As you can see, the UI helps to simplify the process. You just drag and drop your image — no complicated scripts required.
Introducing VisualInsight
Core Idea
VisualInsight is a Streamlit-based web application that simplifies image analysis using Google Generative AI (Gemini). It incorporates AWS S3 for secure storage of original images and results.
Figure 3: Analysis results displayed in the Streamlit application
By automating much of the heavy lifting, VisualInsight ensures you spend less time on configuration and more time on innovation. Key Components
- Streamlit UI: A user-friendly interface for uploading, viewing, and analyzing images.
- LLM Service (Google Gemini): Advanced text-based insights derived from images.
- AWS S3 Storage: Secure storage for files and AI-generated analyses.
- Docker & Terraform: Infrastructure for quick deployments and reproducibility.
- CI/CD via GitHub Actions: Automated builds, tests, and deployments for reliability.
How VisualInsight Works
- Upload an Image Drag and drop a JPG or PNG file onto the application.
- AI Analysis with Google Gemini The uploaded image is then passed to the LLMService class, which uses Google’s Generative AI (Gemini) to generate descriptive insights about the image content.
Figure 4: Further analysis details being displayed to the user
3. Storage in AWS S3 Once analyzed, the application uploads both the original image and any analysis results to an S3 bucket for safe-keeping.
4. Display Results Insights are displayed in the application interface for immediate feedback.
Figure 5: Another view of the analysis interface
Code Highlights Below are some of the core services that power VisualInsight.
- LLM Service (app/services/llm_service.py)
Handles the interaction with Google Gemini for image analysis.
import google.generativeai as genai
import os
from datetime import datetime
from PIL import Image
from utils.logger import setup_logger
logger = setup_logger()
class LLMService:
def __init__(self):
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
self.model = genai.GenerativeModel('gemini-1.5-flash-002')
self.prompt = """
Analyze this Image and provide:
1. Image type
2. Key information
3. Important details
4. Notable observations
"""
def analyze_document(self, image: Image.Image) -> dict:
try:
logger.info("Sending request to LLM")
# Generate content directly with the PIL image
response = self.model.generate_content([
self.prompt,
image
])
return {
"analysis": response.text,
"timestamp": datetime.now().isoformat()
}
except Exception as e:
logger.error(f"LLM analysis failed: {str(e)}")
raise Exception(f"Failed to analyze document: {str(e)}")
What’s Happening Here?
- I configure our Google Generative AI (Gemini) with an API key.
- A default prompt outlines the kind of analysis we want.
- The analyze_document method sends the image to Gemini and returns its text-based analysis.
2. S3 Service (app/services/s3_service.py) Uploads files to AWS S3 with timestamped keys and generates presigned URLs for private access.
import boto3
import os
from datetime import datetime
from utils.logger import setup_logger
logger = setup_logger()
class S3Service:
def __init__(self):
self.s3_client = boto3.client(
's3',
aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
region_name=os.getenv('AWS_REGION', 'us-east-1')
)
self.bucket_name = os.getenv('S3_BUCKET_NAME')
def upload_file(self, file):
"""Upload file to S3 and return the URL"""
try:
# Generate unique filename
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
file_key = f"uploads/{timestamp}_{file.name}"
# Upload to S3
self.s3_client.upload_fileobj(
file,
self.bucket_name,
file_key
)
# Generate presigned URL that expires in 1 hour
url = self.s3_client.generate_presigned_url(
'get_object',
Params={
'Bucket': self.bucket_name,
'Key': file_key
},
ExpiresIn=3600
)
logger.info(f"File uploaded successfully: {url}")
return url
except Exception as e:
logger.error(f"S3 upload failed: {str(e)}")
raise Exception(f"Failed to upload file to S3: {str(e)}")
Figure 6: The AWS S3 bucket that stores uploaded images and analysis results Core Features:
- Uses boto3 to interact with AWS S3.
- Generates a time-stamped key for each file.
- Creates a presigned URL for private file access without requiring you to open up the entire bucket.
3. The Streamlit Application (app/main.py) Provides the user interface for file uploads, analysis initiation, and displaying results.
import streamlit as st
import os
from dotenv import load_dotenv
from services.s3_service import S3Service
from services.llm_service import LLMService
from utils.logger import setup_logger
from PIL import Image
# Load environment variables
load_dotenv()
# Setup logging
logger = setup_logger()
# Initialize services
s3_service = S3Service()
llm_service = LLMService()
def main():
st.title("Document Analyzer")
uploaded_file = st.file_uploader("Upload a document", type=['png', 'jpg', 'jpeg'])
if uploaded_file:
# Display image
image = Image.open(uploaded_file)
st.image(image, caption='Uploaded Document', use_column_width=True)
if st.button('Analyze Document'):
with st.spinner('Processing...'):
try:
# Analyze with LLM directly
logger.info("Starting document analysis")
analysis = llm_service.analyze_document(image)
# Upload to S3 for storage
logger.info(f"Uploading file: {uploaded_file.name}")
s3_url = s3_service.upload_file(uploaded_file)
# Display results
st.success("Analysis Complete!")
st.json(analysis)
except Exception as e:
logger.error(f"Error processing document: {str(e)}")
st.error(f"Error: {str(e)}")
if __name__ == "__main__":
main()
- Streamlit handles the UI: file upload, display, button triggers.
- LLMService and S3Service are orchestrated together to handle the AI query and file upload.
- Real-time logs inform you of the status and highlight any issues.
Running VisualInsight Locally
- Clone the Repository
git clone https://github.com/yotambraun/VisualInsight.git cd VisualInsight
2. Environment Setup
Create a .env file at the project root:
AWS_ACCESS_KEY_ID=YOUR_AWS_KEY AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET AWS_REGION=us-east-1 S3_BUCKET_NAME=YOUR_BUCKET_NAME GOOGLE_API_KEY=YOUR_GOOGLE_GENAI_KEY
3. Install Dependencies
pip install -r requirements.txt
4. Run the App
streamlit run app/main.py
Navigate to http://localhost:8501 in your browser to start using VisualInsight!
Containerization with Docker Use Docker for consistent application performance across environments.
Figure 7: AWS ECS used for container orchestration
Dockerfile (excerpt):
FROM python:3.9-slim WORKDIR /app # Install dependencies COPY requirements.txt . RUN pip install -r requirements.txt # Copy application code COPY app/ . EXPOSE 8501 ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]
Steps:
- Build and Run Locally:
docker build -t visualinsight:latest .
- Run:
docker run -p 8501:8501 visualinsight:latest
Visit http://localhost:8501 to use the app.
Infrastructure as Code with Terraform
Figure 8: AWS ECR, storing Docker images for the application I use Terraform to create and manage AWS resources: S3, ECR, ECS, and more for deploying the application.
Why Terraform? Terraform allows you to define your cloud infrastructure as code. Rather than manually creating AWS resources via the console or CLI, you simply write a configuration file. This ensures that your infrastructure is consistent, version-controlled, and easily replicable across multiple environments.
Key Advantages of Using Terraform:
- Reproducibility: The same configurations can be deployed multiple times without drift.
- Collaboration: Teams can review Terraform files in Git, allowing for better code reviews and fewer mistakes.
- Scalability: Quick spin-up of additional resources if your usage grows.
- Example Variables (infrastructure/terraform/variables.tf)
variable "aws_region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "bucket_name" {
description = "Name of the S3 bucket"
type = string
}
2. Main Configuration (infrastructure/terraform/main.tf)
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0"
}
}
}
provider "aws" {
region = var.aws_region
}
resource "aws_s3_bucket" "documents" {
bucket = var.bucket_name
}
resource "aws_ecr_repository" "app" {
name = "document-analyzer"
}
resource "aws_ecs_cluster" "main" {
name = "document-analyzer-cluster"
}
# ... ECS Service, Security Groups, Task Definition, etc.
Why ECR and ECS?
- Amazon ECR (Elastic Container Registry): A private registry for storing your Docker images. Instead of relying on Docker Hub or other third parties, ECR keeps your images secure within your AWS account.
- Amazon ECS (Elastic Container Service): An AWS-native container orchestration service. It manages the scaling and deployment of your containerized application automatically. With Fargate (serverless compute engine for containers), you don’t have to worry about provisioning or managing EC2 instances; it abstracts away all the heavy lifting.
In Short:
- ECR stores your built Docker images.
- ECS pulls those images from ECR and runs them as containers in a scalable manner.
3. Deploying via Terraform
cd infrastructure/terraform terraform init terraform plan -var="bucket_name=my-visualinsight-bucket" terraform apply -var="bucket_name=my-visualinsight-bucket"
Terraform will:
- Create an S3 bucket.
- Create an ECR repository.
- Set up an ECS cluster, tasks, services, IAM roles, and more.
Automated CI/CD with GitHub Actions Automate the build, test, and deployment process to ensure consistent updates.
Your .github/workflows/deploy.yml takes care of:
- AWS Login: Authenticates with your AWS account using secrets.
- Docker Build & Push: Builds the Docker image and pushes it to Amazon ECR.
- ECS Update: Forces a new deployment on ECS to pull the latest image.
Figure 9: GitHub Actions
Figure 10: GitHub Actions pipeline for CI/CD
Sample Deploy Workflow:
name: Deploy to AWS
on:
push:
branches: [ main ]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build and push Docker image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: document-analyzer
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
- name: Deploy to ECS
run: |
aws ecs update-service --cluster document-analyzer-cluster --service document-analyzer --force-new-deployment
Here’s a breakdown of what each step does and why it’s essential: Workflow Trigger
on:
push:
branches: [ main ]
- What It Does: The workflow is triggered whenever there’s a push to the main branch of the GitHub repository. This ensures that deployment happens only when stable, production-ready code is pushed.
- Why It’s Important: Keeps the deployment pipeline consistent with the latest updates to the main codebase, avoiding manual deployment efforts.
Job Definition
jobs:
deploy:
runs-on: ubuntu-latest
- What It Does: Defines a job named deploy that runs on the latest Ubuntu environment.
- Why It’s Important: Ensures a consistent and reliable operating system environment for running the deployment steps.
Step 1: Check Out the Code
- uses: actions/checkout@v2
- What It Does: Clones the repository code into the runner.
- Why It’s Important: Provides access to the most recent changes in the codebase, which are necessary for building and deploying the application.
Step 2: Configure AWS Credentials
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- What It Does: Authenticates the GitHub runner with AWS using securely stored credentials.
- Why It’s Important: Allows the runner to interact with AWS services like ECR and ECS while ensuring security through encrypted credentials.
Step 3: Login to Amazon ECR
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- What It Does: Logs into the Amazon Elastic Container Registry (ECR) to manage Docker images.
- Why It’s Important: Provides access to the private ECR repository where Docker images are stored.
Step 4: Build and Push Docker Image
- name: Build and push Docker image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: document-analyzer
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
- What It Does:
- Builds the Docker image from the application code.
- Tags the image with the unique GitHub commit hash (github.sha) for versioning.
- Pushes the image to Amazon ECR.
- Why It’s Important: Ensures the application is containerized, version-controlled, and ready for deployment. Using ECR ensures secure and scalable image storage.
Step 5: Deploy to ECS
- name: Deploy to ECS
run: |
aws ecs update-service --cluster document-analyzer-cluster --service document-analyzer --force-new-deployment
- What It Does: Updates the ECS service to use the newly pushed Docker image.
- Why It’s Important: Automatically deploys the latest version of the application to the containerized environment, ensuring the live service is always up-to-date.
Whenever you push to main, GitHub Actions will build and deploy your latest changes automatically.
Real-World Impact
- Time Efficiency With AI-driven analysis, there’s no need for manual labeling or advanced ML pipeline setup.
- Scalability AWS S3 + ECS means you can handle ever-growing image datasets and traffic without re-architecting.
- Reliability Docker ensures consistent environments; Terraform standardizes infrastructure, and GitHub Actions automates testing and deployment.
- User-Friendly Streamlit’s intuitive UI means non-developers can upload images and see insights in real time.
Conclusion
VisualInsight takes the guesswork out of image analysis. By combining Streamlit, Google Generative AI (Gemini), AWS S3, Terraform and CI/CD, it delivers a robust, scalable solution that’s easy to use and maintain. VisualInsight streamlines the entire workflow — so you can focus on making discoveries, not wrestling with infrastructure.
Key Takeaways
- Automation reduces manual work and simplifies processes.
- Infrastructure as Code promotes collaboration and reproducibility.
- Docker ensures consistency across development and production environments.
- CI/CD enables fast and reliable updates.
Feel free to clone the GitHub Repository (https://github.com/your-username/VisualInsight) and customize it for your own project needs. If you enjoyed this, consider clapping on Medium, sharing with others, or following me for more deep dives into AI and cloud solutions! Thanks for Reading!
References
Google Gemini: Google’s advanced AI model designed for multimodal data processing, including text, images, and audio.
Streamlit: An open-source app framework for creating and sharing data applications using Python.
AWS S3: Amazon Simple Storage Service (S3) is an object storage service offering scalability, data availability, security, and performance.
Docker: A platform for developing, shipping, and running applications inside containers, ensuring consistency across multiple development and release cycles.
Terraform: An open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure.
GitHub Actions: A CI/CD platform that allows you to automate your build, test, and deployment pipeline.
AWS ECR (Elastic Container Registry): A fully managed container registry that makes it easy for developers to store, manage, and deploy Docker container images.
AWS ECS (Elastic Container Service): A highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS. These references provide detailed information about each component used in the VisualInsight application.
Read the full article here: https://pub.towardsai.net/visualinsight-an-end-to-end-image-analysis-application-using-google-generative-ai-gemini-7847f6f0a1b9