📄️ 初期設定
このページの日本語版は現在準備中です。最新の内容については英語版またはベトナム語版をご参照ください。
🗃️ ワークスペース
Workspace is a dedicated working environment for users within the Data Platform system. Its primary purpose is to provide an isolated and secure space where users can efficiently and conveniently perform data-related operations and workflows.
🗃️ CDCサービス
CDC Service is a service that provides a real-time Change Data Capture (CDC) platform for monitoring database changes. It enables users to easily define connectors to integrate data into and out of Kafka from various database systems.
🗃️ Apache Superset
Apache Superset is an open-source Business Intelligence (BI) platform that enables users to visualize data, create interactive dashboards, and perform data analysis with ease. It serves as a powerful alternative to Tableau and Power BI, especially in big data ecosystems leveraging platforms such as Druid, Presto, Trino, BigQuery, ClickHouse, MySQL, PostgreSQL, and many others.
🗃️ JupyterHub
JupyterHub is an open-source platform designed to provide a multi-user Jupyter Notebook environment, enabling data scientists, data engineers, and software developers to access computational resources for data analysis, data processing, and machine learning model development. When integrated into the Cloud Data Platform, JupyterHub becomes a core component that allows management, scaling, and optimization of resources across cloud services, thereby supporting large-scale data storage and processing workflows.
🗃️ Ranger
FPT Data Governance, powered by Ranger, is a security and access control solution designed for Lakehouse environments using the Trino query engine. It provides centralized and fine-grained access management, supporting both Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC) models.
🗃️ Hive Metastore
Hive Metastore is a core component for metadata storage within a Lakehouse architecture. It provides information about tables, schemas, partitions, and data locations, enabling engines such as Apache Spark, Trino, and Presto to efficiently understand, manage, and access data.
🗃️ クエリエンジン
FPT Query Engine, powered by Trino, is an open-source distributed SQL query engine designed to deliver fast and efficient querying across large-scale datasets. Trino enables users to query data from multiple sources, including relational databases, data warehouses, and non-relational storage systems, without the need to move or duplicate data.
🗃️ Nessie
Nessie is designed to support large-scale and complex distributed data environments, enabling data teams to more effectively manage data development, version control, and deployment processes across the system.
🗃️ Flink
Apache Flink is an open-source distributed data processing framework primarily designed for real-time stream processing. In addition to stream processing, it also supports batch processing, but it is especially recognized for its ability to handle continuous data streams with low latency. Flink offers flexible scalability, supports stateful processing, and ensures data consistency, making it a leading choice for Big Data Analytics, Machine Learning, IoT, financial systems, and system monitoring applications.
🗃️ オーケストレーション
The Orchestration service is defined as a service that manages and automates workflows within a data system, ensuring that data processing tasks are executed sequentially or in parallel according to schedules or events, while providing effective monitoring and troubleshooting capabilities.
🗃️ インジェストサービス
The Ingestion service is built to automate data flows between systems. It manages, orchestrates, and automates the movement of data between different systems in an easy and efficient manner, while providing data flow monitoring, supervision, and management capabilities.
🗃️ プロセッシングサービス
The Processing Service is a service deployed on the Data Platform that provides batch and real-time data processing capabilities through user-configured compute resources. The service supports both CPU and GPU environments, enabling flexible execution of high-performance data processing tasks in a distributed and efficient manner.
🗃️ Open Metadata
Open Metadata is defined as a platform for managing and automating metadata within a data system. It centralizes the collection, organization, and governance of information about data objects from multiple sources. The platform supports data tracking, classification, lineage tracing, and change alerting, thereby improving operational efficiency and ensuring data quality across the organization.