Go for Data Science – High-Performance Data Pipelines & APIs

Title: Go for Data Science – High-Performance Data Pipelines & APIs

Tutorial Time: 9.30 - 13.30

Level: Beginner to Intermediate

Prerequisites: Basic programming knowledge (Python preferred) and a laptop with Go installed.

Presenters: Kiettiphong Manovisut

  • Onsite Registration Fee: 7,500 Bath
  • Online Registration Fee: 5,000 Bath

Abstract:

While Python remains the king of Exploratory Data Analysis (EDA) and Model Training, Go (Golang) has emerged as the preferred language for production-grade data engineering. Its efficiency in handling massive datasets, native concurrency, and lightning-fast execution makes it the ideal companion for Data Scientists moving models into production.

This hands-on workshop is designed specifically for Data Science students and practitioners. Participants will transition from Pythonic thinking to Go’s statically-typed paradigm, culminating in the creation of a high-performance REST API that serves data processed in Go.

Learning Objectives:

  • Understand the performance trade-offs between Python and Go in a data context.
  • Implement core Go data structures (Slices, Maps, Structs) to handle tabular data.
  • Utilize Go’s Concurrency (Goroutines & Channels) to accelerate data ingestion.
  • Develop and deploy a production-ready REST API using the standard library.

Tutorial Outline:

    Module 1: The Go Advantage (40 min)
  • The Hybrid Workflow: When to use Go vs. Python in the ML lifecycle.
  • Environment Setup: Configuring the Go workspace and VS Code integration.
  • Project Initialization: Mastering go mod and the Go project structure.
    Module 2: Strong Typing for Data Integrity (50 min)
  • From Dynamic to Static: Variables, type inference, and safety.
  • Data Structures: Using Slices and Maps for data manipulation.
  • Object Modeling: Leveraging Structs as "Data Classes."
  • [Lab 1]: Building a statistical calculator (Mean/Std Dev) from scratch.
    Module 3: Robust Data Handling (50 min)
  • Error Handling: The val, err pattern vs. Try/Except.
  • File I/O: Reading and parsing CSV datasets efficiently.
  • Data Serialization: Mapping JSON to Structs for web transmission.
  • [Lab 2]: Transforming raw CSV datasets into structured JSON objects.
    Module 4: Parallelism & High-Throughput (40 min)
  • Goroutines: Achieving massive parallelism without the overhead of threads.
  • Channels: Secure communication between concurrent data processes.
  • [Lab 3]: Building a concurrent data scraper to fetch records from multiple sources.
    Module 5: Serving Data at Scale (40 min)
  • Web Services: Creating endpoints with net/http.
  • API Design: Routing, query parameters, and JSON responses.
  • [Final Project]: Build a "Top-N Student Scores" API that processes data on-the-fly.
    Recommended Resources
  • Go by Example (gobyexample.com)
  • API Design: Routing, query parameters, and JSON responses.
  • [A Tour of Go (go.dev/tour)
  • Notable libraries: Gin (web), GoNum (numerical), Go-Arrow (Apache Arrow)