Halo Media Logo

Halo Media

LLM Data Engineer | United States | Fully Remote

Sorry, this job was removed Sorry, this job was removed at 06:05 p.m. (CST) on Sunday, Apr 06, 2025
Remote
Hiring Remotely in United States
Remote
Hiring Remotely in United States

Description

We are seeking an experienced AI/LLM Data Engineer to build and maintain the data pipeline for our Generative AI platform. The ideal candidate will be well-versed in the latest Large Language Model (LLM) technologies and have a strong background in data engineering, with a focus on Retrieval-Augmented Generation (RAG) and knowledge-base techniques.  This role sits in the AI COE within DX Tech & Digital. As a AI/LLM Data Engineer (you will report into the Director, AI Solutions & Development who oversees the AI COE. 

You will work on highly visible strategic projects, collaborating with cross-functional teams 

to define requirements and deliver high-quality AI solutions. 

The ideal candidate will have a passion for Generative AI and LLMs, with a proven track record of delivering innovative AI applications.

Responsibilities 
• Design, implement, and maintain an end-to-end multi-stage data pipeline for LLMs, including Supervised Fine Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) data processes 
• Identify, evaluate, and integrate diverse data sources and domains to support the Generative AI platform 
• Develop and optimize data processing workflows for chunking, indexing, ingestion, and vectorization for both text and non-text data 
• Benchmark and implement various vector stores, embedding techniques, and retrieval methods 
• Create a flexible pipeline supporting multiple embedding algorithms, vector stores, and search types (e.g., vector search, hybrid search) 
• Implement and maintain auto-tagging systems and data preparation processes for LLMs 
• Develop tools for text and image data crawling, cleaning, and refinement 
• Collaborate with cross-functional teams to ensure data quality and relevance for AI/ML models 
• Work with data lake house architectures to optimize data storage and processing 
• Integrate and optimize workflows using Snowflake and various vector store technologies 

Requirements

• Master's degree in Computer Science, Data Science, or a related field 
• 3-5 years of work experience in data engineering, preferably in AI/ML contexts 
• Proficiency in Python, JSON, HTTP, and related tools 
• Strong understanding of LLM architectures, training processes, and data requirements 
• Experience with RAG systems, knowledge base construction, and vector databases 
• Familiarity with embedding techniques, similarity search algorithms, and information retrieval concepts 
• Hands-on experience with data cleaning, tagging, and annotation processes (both manual and automated) 
• Knowledge of data crawling techniques and associated ethical considerations 
• Strong problem-solving skills and ability to work in a fast-paced, innovative environment 
• Familiarity with Snowflake and its integration in AI/ML pipelines 
• Experience with various vector store technologies and their applications in AI 
• Understanding of data lakehouse concepts and architectures 
• Excellent communication, collaboration, and problem-solving skills. 
• Ability to translate business needs into technical solutions. 
• Passion for innovation and a commitment to ethical AI development. 
• Experience building LLMs pipeline using framework like LangChain, LlamaIndex, Semantic Kernel, OpenAI functions.
• Familiar with different LLM parameters like temperate, top-k, and repeat penalty, and different LLM outcome evaluation data science metrics and methodologies. 

Preferred Skills

  • Experience with popular LLM/ RAG frameworks  
  • Familiarity with distributed computing platforms (e.g., Apache Spark, Dask) 
  • Knowledge of data versioning and experiment tracking tools 
  • Experience with cloud platforms (AWS, GCP, or Azure) for large-scale data processing 
  • Understanding of data privacy and security best practices 
  • Practical experience implementing data lakehouse solutions 
  • Proficiency in optimizing queries and data processes in Snowflake or Databricks
  • Hands-on experience with different vector store technologies
Benefits
  • US employees benefit package.

Similar Jobs

2 Hours Ago
Easy Apply
Remote
Hybrid
US
Easy Apply
Mid level
Mid level
AdTech • Enterprise Web • Information Technology • Machine Learning • Marketing Tech • Sales
The Data Steward ensures data accuracy, consistency, and governance within Salesforce, improving data models and reporting accuracy while supporting decision-making across OpenX's AdTech ecosystem.
Top Skills: Api IntegrationsETLGoogle BigqueryJsondbLookerPythonSalesforceSQL
4 Hours Ago
Remote
Hybrid
Boston, MA, USA
137K-215K Annually
Senior level
137K-215K Annually
Senior level
Healthtech • Software • Analytics • Biotech • Pharmaceutical • Manufacturing
Lead the development and maintenance of interactive data visualizations for clinical trial data, ensuring compliance and collaboration across teams.
Top Skills: ETLJreviewPower BIPythonQlikviewRSpotfireSQLTableauVeeva Cdb
6 Hours Ago
Remote
Hybrid
7 Locations
84K-202K Annually
Mid level
84K-202K Annually
Mid level
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
This role involves leveraging advanced analytics to drive insights, managing stakeholders, mentoring junior staff, and building tech-enabled business solutions for clients in Deals.
Top Skills: AWSAzureData AnalysisData ManipulationData VisualizationGCPMs Sql ServerOraclePower BIPythonQlikviewSASSQLStatistical ModelingTableau

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account