Publisert:
28.10.25
Atlas AI Training Data Curation Internship: Fueling Knowledge Graph Intelligence
About Cognite
Embark on a transformative journey with Cognite, a global SaaS forerunner in leveraging AI and data to unravel complex business challenges through our cutting-edge offerings including Cognite Atlas AI, an industrial agent workbench, and the Cognite Data Fusion (CDF) platform. We were awarded the 2022 Technology Innovation Leader for Global Digital Industrial Platforms & Cognite was recognized as 2024 Microsoft Energy and Resources Partner of the Year. In the realm of industrial digital transformation, we stand at the forefront, reshaping the future of Oil & Gas, Chemicals, Pharma and other Manufacturing and Energy sectors. Join us in this venture where AI and data meet ingenuity, and together, we forge the path to a smarter, more connected industrial future.
Our values
Impact: Cogniters strive to make an impact in all that they do. We are result-oriented, always asking ourselves.
Ownership: Cogniters embrace a culture of ownership. We go beyond our comfort zones to contribute to the greater good, fostering inclusivity and sharing responsibilities for challenges and success.
Relentless: Cogniters are relentless in their pursuit of innovation. We are determined and deliverable (never ruthless or reckless), facing challenges head-on and viewing setbacks as opportunities for growth.
The Atlas AI team is at the forefront of leveraging AI to transform industrial data interactions. A key initiative involves fine-tuning large language models (LLMs) to generate precise queries for our industrial knowledge graph, enabling intelligent agents to extract relevant insights. The success of this endeavor heavily relies on high-quality training data.
This internship project offers an exceptional opportunity for students passionate about Machine Learning and AI to contribute to the foundational data infrastructure for our next-generation AI agents. The intern will be responsible for the end-to-end process of curating, preparing, and fine-tuning datasets, running them on our evaluation frameworks. This role is critical for enhancing the accuracy and effectiveness of our knowledge graph querying capabilities.
This internship will span 6-8 weeks, commencing in the first week of July. Interns will work collaboratively in pairs of two, fostering a dynamic and supportive learning environment.
Project Scope & Activities
- Training Data Curation & Preparation:
- Designing and implementing strategies for collecting and annotating high-quality training data specific to industrial knowledge graph query generation.
- Working with domain experts to ensure the accuracy and relevance of the curated data.
- Developing scripts and tools in Python to automate data cleaning, transformation, and formatting for model training.
- Ensuring data privacy and compliance standards are met during curation.
- Model Fine-tuning:
- Experimenting with various fine-tuning techniques for pre-trained language models (LLMs) on the curated datasets.
- Utilizing cloud AI platforms (Google Cloud AI Platform, AWS SageMaker, Azure Machine Learning) for model training and deployment.
- Monitoring training progress, analyzing model performance metrics, and iterating on fine-tuning strategies.
Expected Outcomes
- Successfully curated a high-quality dataset suitable for fine-tuning LLMs for knowledge graph query generation.
- Contributed to the fine-tuning of LLMs on major cloud platforms (Google, AWS, Azure), leading to improved query generation capabilities.
- Provided actionable insights and recommendations to the product based on data analysis and model performance.
- Gained significant practical experience in applied machine learning, data engineering for AI, and cloud computing environments.
- Authored clear documentation on data curation processes, fine-tuning experiments, and evaluation methodologies.
Required Skills & Qualifications
- Machine Learning / Artificial Intelligence: Solid theoretical understanding of machine learning concepts, including natural language processing (NLP) and large language models.
- Python: Advanced proficiency in Python for data manipulation, scripting, and ML framework utilization (e.g., TensorFlow, PyTorch, Hugging Face Transformers).
- Data Analysis: Experience with data analysis libraries (e.g., Pandas, NumPy) and data visualization.
- Problem-Solving: Strong analytical and problem-solving abilities, with a methodical approach to data challenges.
- Collaboration: Ability to work effectively in a team, communicate technical concepts clearly, and adapt to evolving project requirements.
Bonus Skills (Nice to Have):
- Experience with cloud platforms (Google Cloud Platform, AWS, Azure) for ML workloads.
- Familiarity with knowledge graphs, graph databases, or semantic web technologies.
- Experience with MLOps practices or experiment tracking tools.
- Understanding of prompt engineering and agent design principles.
Cognite
The key to industrial digitalization lies in data liberation. Heavy-asset industries like oil and gas, shipping, manufacturing, and power and utilities already have the data. Now they need software to collect, clean, and contextualize the data. A resource to transform the data into information and to stimulate a thriving ecosystem of industrial applications.
Cognite Data Fusion (CDF) presents a digital representation of industrial reality to make it accessible and meaningful for humans and machines.
With CDF, our industrial customers can harness the potential of advanced analytics, deploy algorithms, and build customized applications. We make it possible to maximize the strategic value of data. Realizing the promise of digitalization
To succeed, we need a lot of skill-sets, such as backend programming with large scale distributed systems, real-time systems, machine learning, optimization, web frontends, 3D-models, robots and more. We need project managers who can be consultants for our customers. It will be a very exciting environment where team members will learn new skills from some of the best.
Why work for Cognite?
- You will have a real impact on our customers and Cognite
- Free snacks and drinks throughout the day
- Opportunity to work for, and contribute to the growth of one of the most exciting and fastest-growing new software companies in the world
- Competitive salary and benefits (including pension plans, insurance, parental benefits and more)
- Coverage of mobile telephone subscription and broadband connection
- Extended private health services and free yearly health check
- Subsidized lunch at the canteen, with various food options (pizza/sushi)
- Free staffed gym
- Social activities (book club, team sports activities - football, boxing, regular Cognite social events)
- Free online Norwegian courses for levels A1 and A2