Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine raw data into information that can be used by business analysts and data scientists.
As part of your internship, you will be trained in the different aspects of the data engineer activities. You will build a real-time, end-to-end data streaming ingestion pipeline combining metric collections, data cleansing and aggregation, storage to multiple data warehouses, (near) real-time analysis by exposure key metrics in a dashboard, and the usage of machine learning models applied to the prediction and detection of weak signals.
You will participate in the application architecture and the implementation of the pipeline with the goal of going into production. You will join an agile team led by a Big Data expert.
In addition, you will obtain at the end of the internship a certification from a Cloud provider, and a Databricks certification.
Adaltas specializes in the processing and storage of data. We work on-premise and in the cloud to operate Big Data platforms and strengthen our clients’ teams in the areas of architecture, operations, data engineering, data science and DevOps. Partner with Cloudera and Databricks, we are also open source contributors. We invite you to browse our site and our many technical publications to learn more about Adaltas.
- Collecting system and application metrics
- Supplying a distributed data warehouse with OLAP-type column storage
- Cleansing, enrichment, aggregation of data flows
- Real-time analysis in SQL
- Dashboards creation
- Putting machine learning models into production in an MLOps cycle
- Deployment in an Azure cloud infrastructure and on-premise
- Engineering school, end of studies internship
- Analytical and structured
- Autonomous and curious
- You are an open-minded person who enjoys sharing, communicating and learning from others
- Good knowledge of Python, Spark and Linux systems
You will be in charge of designing the technical architecture. We are looking for a person who masters or who will develop skills on the following tools and solutions:
All complementary experiences are valuable.
- Location: Boulogne Billancourt, France
- Languages: French or English
- Start: February 2022
- Duration: 6 months
- Teleworking: possibility of working 2 days a week remotely
A laptop with the following characteristics:
- 32GB RAM
- 1TB SSD
- 8c/16t CPU
A cluster made up of:
- 3x 28c/56t Intel Xeon Scalable Gold 6132
- 3x 192TB RAM DDR4 ECC 2666MHz
- 3x 14 SSD 480GB SATA Intel S4500 6Gbps
A Kubernetes cluster and a Hadoop cluster.
- Salary 1200 € / month
- Restaurant tickets
- Transportation pass
- Participation in one international conference
For any request for additional information and to submit your application, please contact David Worms: