Magic (magic.dev) Logo

Magic (magic.dev)

Software Engineer - Pretraining Data

Job Posted 12 Days Ago Posted 12 Days Ago
Remote
2 Locations
100K Annually
Mid level
Remote
2 Locations
100K Annually
Mid level
The Software Engineer will create data pipelines and web crawlers for multimodal datasets and ensure data quality through distributed computing techniques.
The summary above was generated by AI

Magic’s mission is to build safe AGI that accelerates humanity’s progress on the world’s most important problems. We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than humans can alone. Our approach combines frontier-scale pre-training, domain-specific RL, ultra-long context, and inference-time compute to achieve this goal.

About the role: 

As a Software Engineer working on our pretraining data, you write efficient and robust pipelines for giant, multimodal datasets. You will develop and optimize web scraping techniques to harvest and maintain data at internet-scale.

What you might work on:

  • Design & implement multimodal (video, audio, text etc) web crawlers for scraping and indexing petabytes of data

  • Create large scale data processing pipelines using tools like Ray, Apache Spark, Apache Flink, Google BigQuery etc. 

  • Implement and scale deduplication techniques across modalities and apply heuristic and model-based techniques for parsing and filtering crawled data

  • Identify new data sources for inclusion in pre/post-training datasets

What we’re looking for:

  • Strong proficiency in distributed computing and parallel processing techniques

  • Obsession with details, reliability, and good testing to ensure data quality and integrity

  • Experience with designing and maintaining high-performance, scalable data architectures

  • Ability to design, develop and operate an LLM data pipeline from web scraping to data loading

Magic strives to be the place where high-potential individuals can do their best work. We value quick learning and grit just as much as skill and experience.

Our culture:

  • Integrity. Words and actions should be aligned

  • Hands-on. At Magic, everyone is building 

  • Teamwork. We move as one team, not N individuals

  • Focus. Safely deploy AGI. Everything else is noise

  • Quality. Magic should feel like magic

Compensation, benefits and perks (US):

  • Annual salary range: $100K - $550K

  • Equity is a significant part of total compensation, in addition to salary

  • 401(k) plan with 6% salary matching

  • Generous health, dental and vision insurance for you and your dependents

  • Unlimited paid time off

  • Visa sponsorship and relocation stipend to bring you to SF, if possible

  • A small, fast-paced, highly focused team

Top Skills

Apache Flink
Spark
Google Bigquery
Ray

Similar Jobs

7 Minutes Ago
Remote
United States
Mid level
Mid level
Consumer Web • eCommerce • Enterprise Web • Events • Sports
As a Lead Software Engineer, you will run a service-oriented architecture, solve complex problems, and build user-friendly interfaces while evaluating new technologies.
Top Skills: .Net CoreAWSC#ElasticsearchGitlabGoKotlinPostgresPythonReactRedisSwiftTypescript
32 Minutes Ago
Remote
Hybrid
2 Locations
85K-115K Annually
Junior
85K-115K Annually
Junior
Real Estate
The Junior Quantitative Developer will analyze investment data and improve financial models, primarily using C# and SQL for data manipulation and efficiency enhancements in valuation processes.
Top Skills: .Net CoreAzure Data FactoryC#ExcelMicrosoft SqlSnowflakeSQLT-Sql
35 Minutes Ago
Remote
United States
188K-301K Annually
Senior level
188K-301K Annually
Senior level
Cloud • Fintech • Food • Information Technology • Software • Hospitality
Lead the technical direction and quality of Toast's POS payments workflows, mentor engineers, and drive improvements in architecture and codebase.
Top Skills: AndroidJavaKotlin

What you need to know about the Austin Tech Scene

Austin has a diverse and thriving tech ecosystem thanks to home-grown companies like Dell and major campuses for IBM, AMD and Apple. The state’s flagship university, the University of Texas at Austin, is known for its engineering school, and the city is known for its annual South by Southwest tech and media conference. Austin’s tech scene spans many verticals, but it’s particularly known for hardware, including semiconductors, as well as AI, biotechnology and cloud computing. And its food and music scene, low taxes and favorable climate has made the city a destination for tech workers from across the country.

Key Facts About Austin Tech

  • Number of Tech Workers: 180,500; 13.7% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Dell, IBM, AMD, Apple, Alphabet
  • Key Industries: Artificial intelligence, hardware, cloud computing, software, healthtech
  • Funding Landscape: $4.5 billion in VC funding in 2024 (Pitchbook)
  • Notable Investors: Live Oak Ventures, Austin Ventures, Hinge Capital, Gigafund, KdT Ventures, Next Coast Ventures, Silverton Partners
  • Research Centers and Universities: University of Texas, Southwestern University, Texas State University, Center for Complex Quantum Systems, Oden Institute for Computational Engineering and Sciences, Texas Advanced Computing Center
By clicking Apply you agree to share your profile information with the hiring company.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account