A comprehensive Go microservice that fetches, consolidates, and analyzes hotel data from multiple sources. Features multi-source data aggregation, review crawling, and LLM-powered recommendation analysis.
Architecture
Alpaca is a single microservice that:
- Fetches hotel data from multiple sources (Amadeus, Expedia, Tripadvisor, Google, Booking.com)
- Consolidates hotel data into a unified schema
- Crawls reviews from multiple sources (Tripadvisor, Google, Expedia, Booking, hotel websites, etc.)
- Uses LLM (GPT-4, Claude, Grok) to analyze reviews for Quality and Quiet
- Generates intelligent recommendations based on review analysis
- Stores data in SQLite (default) with raw SQL
- Uses a generalized provider interface for easy API integration
- Processes data in concurrent batches with rate limiting
Project Structure
alpaca/
├── alpaca/
│ ├── main.go # Main entry point - hotel data worker
│ ├── generate_cities.go # City data generation utility (reference)
│ ├── generated_top_cities.go # Generated top cities data (reference)
│ ├── REVIEW_PROCESSING.md # Review processing documentation
│ ├── models/
│ │ ├── hotel.go # Original Amadeus hotel models
│ │ └── hotel_extended.go # Extended hotel models with recommendations
│ ├── services/
│ │ ├── hotel_service.go # Hotel business logic (Amadeus)
│ │ ├── hotel_service_extended.go # Extended hotel service (multi-source)
│ │ ├── review_crawler.go # Review crawling from multiple sources
│ │ ├── llm_service.go # LLM integration (GPT-4, Claude, Grok)
│ │ └── recommendation_service.go # Recommendation orchestration
│ ├── database/
│ │ └── database.go # SQLite database connection and schema
│ └── utils/
│ └── constants.go # Constants and test data
├── go.mod # Go module definition
├── Dockerfile # Docker build configuration
└── README.md # This file
Features
✅ Simplified Architecture
- Single Microservice: One focused service for hotel data collection
- Raw SQL: No ORM overhead, direct SQL control
- SQLite First: Simple, file-based database (easy to migrate to Postgres/Redshift later)
- Generalized API Interface: Easy to add new hotel data providers
✅ Multi-Source Hotel Data Collection
- Amadeus API: Hotel list, search, and ratings data
- Expedia: Hotel listings and reviews (interface ready)
- Tripadvisor: Hotel data and reviews (interface ready)
- Google Places: Hotel data and reviews (interface ready)
- Booking.com: Hotel data and reviews (interface ready)
- Consolidated Schema: Unified hotel table with ratings from all sources
✅ Review Processing & LLM Analysis
- Multi-Source Review Crawling: Automatically fetches reviews from:
- Tripadvisor, Google, Expedia, Booking.com
- Hotel websites, Bing, Yelp
- LLM-Powered Analysis: Uses GPT-4, Claude, or Grok to analyze reviews
- Quality Detection: Identifies hotels with excellent service, cleanliness, amenities
- Quiet Detection: Identifies quiet, peaceful hotels away from noise
- Intelligent Recommendations: Combines quality and quiet analysis for recommendations
- Admin Override: Admin flag to enable/disable hotels regardless of analysis
✅ Advanced Processing
- Proper Pagination: Handles multi-page API responses automatically
- Concurrent Processing: Uses goroutines for parallel data fetching
- Rate Limiting: Respects API limits with configurable delays
- Error Handling: Graceful degradation and detailed error logging
- Invalid ID Tracking: Skips hotel IDs that consistently fail
🚀 Getting Started
Prerequisites
- Go 1.23+
- Amadeus API credentials (test or production)
Environment Variables
Create a .env file in the project root:
# Amadeus API Credentials
AMD=your_client_id
AMS=your_client_secret
# Optional: Override default API URLs
AMADEUS_HOTEL_LIST_URL=https://test.api.amadeus.com/v1/reference-data/locations/hotels/by-city
AMADEUS_HOTEL_SEARCH_URL=https://test.api.amadeus.com/v2/shopping/hotel-offers
AMADEUS_HOTEL_RATINGS_URL=https://test.api.amadeus.com/v2/e-reputation/hotel-sentiments
# Optional: Database path (defaults to ./alpaca.db)
SQLITE_DB_PATH=./alpaca.db
# Optional: Search radius configuration
HOTEL_SEARCH_RADIUS=100
HOTEL_SEARCH_RADIUS_UNIT=MILE
Running the Service
go build -o alpaca ./alpaca/alpaca
./alpaca
Or from the alpaca/alpaca directory:
cd alpaca/alpaca
go build -o alpaca
./alpaca
The service will:
- Connect to SQLite database (creates if doesn’t exist)
- Fetch hotel list for Austin, TX (default city)
- Fetch detailed search data for all hotels
- Fetch ratings data for test hotel IDs
Database Schema
The service uses a simple normalized schema with three main tables:
-- Basic hotel information
CREATE TABLE hotels (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hotel_id TEXT UNIQUE NOT NULL,
type TEXT,
chain_code TEXT,
dupe_id INTEGER,
name TEXT,
iata_code TEXT,
address TEXT, -- JSON stored as TEXT
geo_code TEXT, -- JSON stored as TEXT
distance TEXT, -- JSON stored as TEXT
last_update TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
-- Detailed hotel metadata
CREATE TABLE hotel_search_data (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hotel_id TEXT UNIQUE NOT NULL,
type TEXT,
chain_code TEXT,
dupe_id INTEGER,
name TEXT,
rating INTEGER,
official_rating INTEGER,
description TEXT, -- JSON stored as TEXT
media TEXT, -- JSON stored as TEXT
amenities TEXT, -- JSON stored as TEXT
address TEXT, -- JSON stored as TEXT
contact TEXT, -- JSON stored as TEXT
policies TEXT, -- JSON stored as TEXT
available INTEGER DEFAULT 0,
offers TEXT, -- JSON stored as TEXT
self TEXT,
hotel_distance TEXT, -- JSON stored as TEXT
last_update TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (hotel_id) REFERENCES hotels(hotel_id)
);
-- Guest ratings and sentiment
CREATE TABLE hotel_ratings_data (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hotel_id TEXT UNIQUE NOT NULL,
type TEXT,
number_of_reviews INTEGER,
number_of_ratings INTEGER,
overall_rating INTEGER,
sentiments TEXT, -- JSON stored as TEXT
last_update TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (hotel_id) REFERENCES hotels(hotel_id)
);
-- Track invalid hotel IDs to skip in future runs
CREATE TABLE invalid_hotel_search_ids (
id INTEGER PRIMARY KEY AUTOINCREMENT,
hotel_id TEXT UNIQUE NOT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
API Provider Interface
The service uses a generalized HotelAPIProvider interface, making it easy to add new hotel data sources:
type HotelAPIProvider interface {
GetOAuthToken(ctx context.Context) (string, error)
FetchHotelsList(ctx context.Context, cityCode string, token string) ([]models.HotelAPIItem, string, error)
FetchHotelSearchData(ctx context.Context, hotelID string, token string) (*models.HotelSearchData, error)
FetchHotelRatingsData(ctx context.Context, hotelID string, token string) (*models.HotelRatingsData, error)
}
Currently implemented:
AmadeusProvider: Full Amadeus API integration
Future providers can be added by implementing this interface.
Data Flow
- OAuth Token: Service authenticates with Amadeus API
- Hotel List: Fetches basic hotel data by city (with pagination)
- Hotel IDs: Extracts all hotel IDs for detailed processing
- Search Data: Concurrently fetches detailed hotel metadata (5 concurrent requests)
- Ratings Data: Concurrently fetches ratings and sentiment data (1 concurrent request for rate limiting)
Performance Features
- Concurrent Processing: 5x faster data fetching with goroutines
- Rate Limiting: API-friendly request patterns (100-200ms delays)
- Pagination Handling: Efficient memory usage for large datasets
- Database Indexing: Optimized query performance
- Error Recovery: Graceful handling of API failures
- Invalid ID Tracking: Skips problematic hotel IDs automatically
Review Processing
See REVIEW_PROCESSING.md for detailed documentation on:
- Review crawling from multiple sources
- LLM analysis for Quality and Quiet detection
- Recommendation generation
- Usage examples
Quick Start - Review Processing
// Initialize services
db, _ := database.NewDatabase()
hotelService := services.NewHotelService(db)
reviewCrawler := services.NewReviewCrawlerService(db)
llmProvider := services.NewOpenAIProvider(os.Getenv("OPENAI_API_KEY"))
llmService := services.NewLLMService(llmProvider)
recommendationService := services.NewRecommendationService(
hotelService, reviewCrawler, llmService,
)
// Process recommendations for a hotel
err := recommendationService.ProcessHotelRecommendations(ctx, "hotel-id")
Next Steps & Recommendations
Database Backend Options
SQLite (Current): Good for development and small datasets
- Pros: Simple, no server needed, fast for reads
- Cons: Limited concurrency, not ideal for high write loads
PostgreSQL: Recommended for production
- Pros: Better concurrency, JSON support, full SQL features
- Cons: Requires server setup
AWS Redshift: For analytics workloads
- Pros: Columnar storage, optimized for analytics
- Cons: More complex setup, better for read-heavy analytics
MongoDB: If you need document flexibility
- Pros: Native JSON, flexible schema
- Cons: Different query model, may need to rethink relationships
Recommendation: Start with SQLite for development, migrate to PostgreSQL for production. The raw SQL approach makes migration straightforward.
Code Simplification Opportunities
Struct Simplification:
- Consider flattening some nested JSON structures
- Remove unused fields from API responses
- Create separate structs for database vs API models
Database Code:
- Add connection pooling configuration
- Implement prepared statements for better performance
- Add transaction support for batch operations
Error Handling:
- Create custom error types for better error handling
- Add retry logic with exponential backoff
- Implement circuit breaker pattern for API calls
Configuration:
- Move hardcoded values to config file
- Add validation for environment variables
- Support multiple city codes
Testing:
- Add unit tests for database operations
- Add integration tests for API provider
- Mock external API calls for testing
Development
Building
go build -o alpaca ./alpaca/alpaca
Running Tests
go test ./...
Code Style
The project follows standard Go conventions:
- Use
gofmtfor formatting - Use
golintfor linting - Follow Go naming conventions
License
MIT