November 7, 2024

mipueblorest

Technologyeriffic

Big data infrastructure internship | Adaltas

Job description

Big Data and distributed computing are at the core of Adaltas. We accompagny our partners in the deployment, maintenance, and optimization of some of the largest clusters in France. Since recently we also provide support for day-day operations.

As a great defender and active contributor of open source, we are at the forefront of the data platform initiative TDP (TOSIT Data Platform).

During this internship, you will contribute to the development of TDP, its industrialization, and the integration of new open source components and new functionalities. You will be accompanied by the Alliage expert team in charge of TDP editor support.

You will also work with the Kubernetes ecosystem and the automation of datalab deployments Onyxia, which we want to make available to our customers as well as to students as part of our teaching modules (devops, big data, etc.).

Your qualifications will help to expand the services of Alliage’s open source support offering. Supported open source components include TDP, Onyxia, ScyllaDB, … For those who would like to do some web work in addition to big data, we already have a very functional intranet (ticket management, time management, advanced search, mentions and related articles, …) but other nice features are anticipated.

You will practice GitOps release chains and write articles.

You will work in a team with senior advisors as mentor.

Company presentation

Adaltas is a consulting agency led by a team of open source experts focusing on data management. We deploy and operate the storage and computing infrastructures in collaboration with our customers.

Partner with Cloudera and Databricks, we are also open source contributors. We invite you to browse our site and our many technical publications to learn more about the company.

Skills required and to be acquired

Automating the deployment of the Onyxia datalab requires knowledge of Kubernetes and Cloud native. You must be comfortable with the Kubernetes ecosystem, the Hadoop ecosystem, and the distributed computing model. You will master how the basic components (HDFS, YARN, object storage, Kerberos, OAuth, etc.) work together to meet the uses of big data.

A good knowledge of using Linux and the command line is required.

During the internship, you will learn:

  • The Kubernetes/Hadoop ecosystem in order to contribute to the TDP project
  • Securing clusters with Kerberos and SSL/TLS certificates
  • High availability (HA) of services
  • The distribution of resources and workloads
  • Supervision of services and hosted applications
  • Fault tolerant Hadoop cluster with recoverability of lost data on infrastructure failure
  • Infrastructure as Code (IaC) via DevOps tools such as Ansible and [Vagrant](/en/tag/hashicorp- vagrant/)
  • Be comfortable with the architecture and operation of a data lakehouse
  • Code collaboration with Git, Gitlab and Github

Responsibilities

  • Become familiar with the architecture and configuration methods of the TDP distribution
  • Deploy and test secure and highly available TDP clusters
  • Contribute to the TDP knowledge base with troubleshooting guides, FAQs and articles
  • Actively contribute ideas and code to make iterative improvements to the TDP ecosystem
  • Research and analyze the differences between the main Hadoop distributions
  • Update Adaltas Cloud using Nikita
  • Contribute to the development of a tool to collect customer logs and metrics on TDP and ScyllaDB
  • Actively contribute ideas to develop our support solution

Additional information

  • Location: Boulogne Billancourt, France
  • Languages: French or English
  • Starting date: March 2023
  • Duration: 6 months

Much of the digital world runs on Open Source software and the Big Data industry is booming. This internship is an opportunity to gain valuable experience in both domains. TDP is now the only truly Open Source Hadoop distribution. This is a great momentum. As part of the TDP team, you will have the possibility to learn one of the core big data processing models and participate in the development and the future roadmap of TDP. We believe that this is an exciting opportunity and that on completion of the internship, you will be ready for a successful career in Big Data.

Equipment available

A laptop with the following characteristics:

  • 32GB RAM
  • 1TB SSD
  • 8c/16t CPU

A cluster made up of:

  • 3x 28c/56t Intel Xeon Scalable Gold 6132
  • 3x 192TB RAM DDR4 ECC 2666MHz
  • 3x 14 SSD 480GB SATA Intel S4500 6Gbps

A Kubernetes cluster and a Hadoop cluster.

Remuneration

  • Salary 1200 € / month
  • Restaurant tickets
  • Transportation pass
  • Participation in one international conference

In the past, the conferences which we attended include the KubeCon organized by the CNCF foundation, the Open Source Summit from the Linux Foundation and the Fosdem.

For any request for additional information and to submit your application, please contact David Worms: