使用DataX同步hive数据到MySQL

目录

1、组件环境

2、安装datax

2.1、下载datax并解压

3、安装datax-web

3.0、下载datax-web的源码,进行编译

3.1、在MySQL中创建datax-web元数据

3.2、安装data-web 

3.2.1执行install.sh命令解压部署

3.2.1、手动修改 datax-admin配置文件

 3.2.2、手动修改 datax-executor配置文件

  3.2.3、替换datax下的python执行文件

 3.2.4、替换MySQLjar文件

4、创建MySQL和Hive数据

4.1、创建MySQL数据库

4.2、创建Hive数据库 

5、配置datax和datax-web

5.1、启动datax-web服务

5.2、页面访问及配置

5.2.1创建项目 

5.2.2、添加mysql和hive的数据源

5.2.3创建DataX任务模板

5.2.4、任务构建

5.2.5、查询创建的任务

 5.2.6、在Hive中执行数据插入

6、查看效果


1、组件环境

名称版本描述下载地址
hadoop3.4.0官方下载bin
hive3.1.3下载源码编译
mysql8.0.31
datax0.0.1-SNAPSHOThttp://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz
datax-webdatax-web-2.1.2下载源码进行编译github.com
centoscentos7x86版本
java1.8
 

2、安装datax

2.1、下载datax并解压

http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

tar -zxvf datax.tar.gz

解压到/cluster目录下 

执行测试命令

./bin/datax.py job/job.json

出现报错

  File "/cluster/datax/bin/./datax.py", line 114
    print readerRef
    ^^^^^^^^^^^^^^^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?

说明:系统中安装python3和python2,默认是python3,由于datax bin中的datax默认支持python2,

所以需要指定python版本为python2,后面会将这个三个文件进行备份替换,datax-web中doc下提供了默认的支持。

  • Python (2.x) (支持Python3需要修改替换datax/bin下面的三个python文件,替换文件在doc/datax-web/datax-python3下) 必选,主要用于调度执行底层DataX的启动脚本,默认的方式是以Java子进程方式执行DataX,用户可以选择以Python方式来做自定义的改造
  • 参考地址:datax-web/doc/datax-web/datax-web-deploy.md at master · WeiYe-Jing/datax-web (github.com)

python2 bin/datax.py job/job.json  

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


2024-10-13 21:04:41.543 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-10-13 21:04:41.553 [main] INFO  Engine - the machine info  => 

    osInfo:    Oracle Corporation 1.8 25.40-b25
    jvmInfo:    Linux amd64 5.8.13-1.el7.elrepo.x86_64
    cpu num:    8

    totalPhysicalMemory:    -0.00G
    freePhysicalMemory:    -0.00G
    maxFileDescriptorCount:    -1
    currentOpenFileDescriptorCount:    -1

    GC Names    [PS MarkSweep, PS Scavenge]

    MEMORY_NAME                    | allocation_size                | init_size                      
    PS Eden Space                  | 256.00MB                       | 256.00MB                       
    Code Cache                     | 240.00MB                       | 2.44MB                         
    Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
    PS Survivor Space              | 42.50MB                        | 42.50MB                        
    PS Old Gen                     | 683.00MB                       | 683.00MB                       
    Metaspace                      | -0.00MB                        | 0.00MB                         


2024-10-13 21:04:41.575 [main] INFO  Engine - 
{
    "content":[
        {
            "reader":{
                "name":"streamreader",
                "parameter":{
                    "column":[
                        {
                            "type":"string",
                            "value":"DataX"
                        },
                        {
                            "type":"long",
                            "value":19890604
                        },
                        {
                            "type":"date",
                            "value":"1989-06-04 00:00:00"
                        },
                        {
                            "type":"bool",
                            "value":true
                        },
                        {
                            "type":"bytes",
                            "value":"test"
                        }
                    ],
                    "sliceRecordCount":100000
                }
            },
            "writer":{
                "name":"streamwriter",
                "parameter":{
                    "encoding":"UTF-8",
                    "print":false
                }
            }
        }
    ],
    "setting":{
        "errorLimit":{
            "percentage":0.02,
            "record":0
        },
        "speed":{
            "byte":10485760
        }
    }
}

2024-10-13 21:04:41.599 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2024-10-13 21:04:41.601 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2024-10-13 21:04:41.601 [main] INFO  JobContainer - DataX jobContainer starts job.
2024-10-13 21:04:41.604 [main] INFO  JobContainer - Set jobId = 0
2024-10-13 21:04:41.623 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2024-10-13 21:04:41.624 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do prepare work .
2024-10-13 21:04:41.624 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do prepare work .
2024-10-13 21:04:41.624 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2024-10-13 21:04:41.625 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 10485760 bytes.
2024-10-13 21:04:41.626 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
2024-10-13 21:04:41.627 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
2024-10-13 21:04:41.649 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2024-10-13 21:04:41.654 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2024-10-13 21:04:41.657 [job-0] INFO  JobContainer - Running by standalone Mode.
2024-10-13 21:04:41.666 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2024-10-13 21:04:41.671 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2024-10-13 21:04:41.672 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2024-10-13 21:04:41.685 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2024-10-13 21:04:41.986 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[302]ms
2024-10-13 21:04:41.987 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2024-10-13 21:04:51.677 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.022s |  All Task WaitReaderTime 0.040s | Percentage 100.00%
2024-10-13 21:04:51.677 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2024-10-13 21:04:51.678 [job-0] INFO  JobContainer - DataX Writer.Job [streamwriter] do post work.
2024-10-13 21:04:51.678 [job-0] INFO  JobContainer - DataX Reader.Job [streamreader] do post work.
2024-10-13 21:04:51.678 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2024-10-13 21:04:51.680 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /cluster/datax/hook
2024-10-13 21:04:51.682 [job-0] INFO  JobContainer - 
     [total cpu info] => 
        averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
        -1.00%                         | -1.00%                         | -1.00%
                        

     [total gc info] => 
         NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
         PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
         PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

2024-10-13 21:04:51.682 [job-0] INFO  JobContainer - PerfTrace not enable!
2024-10-13 21:04:51.683 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.022s |  All Task WaitReaderTime 0.040s | Percentage 100.00%
2024-10-13 21:04:51.684 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2024-10-13 21:04:41
任务结束时刻                    : 2024-10-13 21:04:51
任务总计耗时                    :                 10s
任务平均流量                    :          253.91KB/s
记录写入速度                    :          10000rec/s
读出记录总数                    :              100000
读写失败总数                    :                   0

./bin/datax.py -r hdfsreader -w mysqlwriter 

3、安装datax-web

3.0、下载datax-web的源码,进行编译

git@github.com:WeiYe-Jing/datax-web.git

mvn -U clean package assembly:assembly -Dmaven.test.skip=true

  • 打包成功后的DataX包位于 {DataX_source_code_home}/target/datax/datax/ ,结构如下:

    $ cd  {DataX_source_code_home}
    $ ls ./target/datax/datax/
    bin		conf		job		lib		log		log_perf	plugin		script		tmp

3.1、在MySQL中创建datax-web元数据

mysql -u root -p

password:******

CREATE DATABASE dataxweb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

use dataxweb;

3.2、安装data-web 

3.2.1执行install.sh命令解压部署

在/cluster/datax-web-2.1.2/bin目录下

执行./install.sh

根据提示执行

执行完成后,会解压文件,并初始化数据库。

 

3.2.1、手动修改 datax-admin配置文件

/cluster/datax-web-2.1.2/modules/datax-admin/bin/env.properties

内容如下

# environment variables

JAVA_HOME=/java/jdk

WEB_LOG_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/logs
WEB_CONF_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/conf

DATA_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/data
SERVER_PORT=6895

PID_FILE_PATH=/cluster/datax-web-2.1.2/modules/datax-admin/dataxadmin.pid


# mail account
MAIL_USERNAME="1024122298@qq.com"
MAIL_PASSWORD="*********************"


#debug
REMOTE_DEBUG_SWITCH=true
REMOTE_DEBUG_PORT=7223
 

 3.2.2、手动修改 datax-executor配置文件

/cluster/datax-web-2.1.2/modules/datax-executor/bin/env.properties

内容如下:主要是PYTHON_PATH=/cluster/datax/bin/datax.py

# environment variables

JAVA_HOME=/java/jdk

SERVICE_LOG_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/logs
SERVICE_CONF_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/conf
DATA_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/data


## datax json文件存放位置
JSON_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/json


## executor_port
EXECUTOR_PORT=9999


## 保持和datax-admin端口一致
DATAX_ADMIN_PORT=6895

## PYTHON脚本执行位置
PYTHON_PATH=/cluster/datax/bin/datax.py

## dataxweb 服务端口
SERVER_PORT=9504

PID_FILE_PATH=/cluster/datax-web-2.1.2/modules/datax-executor/service.pid


#debug 远程调试端口
REMOTE_DEBUG_SWITCH=true
REMOTE_DEBUG_PORT=7224
 

  3.2.3、替换datax下的python执行文件

  • Python (2.x) (支持Python3需要修改替换datax/bin下面的三个python文件,替换文件在doc/datax-web/datax-python3下) 必选,主要用于调度执行底层DataX的启动脚本,默认的方式是以Java子进程方式执行DataX,用户可以选择以Python方式来做自定义的改造
  • 这个单个文件可从datax-web源码目录中获取

 3.2.4、替换MySQLjar文件

datax下的mysql reader和writer的jar版本过低,使用的mysql8数据库,需要替换下jar文件

路径是:

/cluster/datax/plugin/writer/mysqlwriter/libs/mysql-connector-j-8.3.0.jar

/cluster/datax/plugin/reader/mysqlreader/libs/mysql-connector-j-8.3.0.jar

4、创建MySQL和Hive数据

4.1、创建MySQL数据库

=================================================================================================================================
2024年10月13日 星期日 第41周 00:07:20
mysql建表
-- m31094.mm definition
=================================================================================================================================
CREATE TABLE `mm` (
  `uuid` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `time` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `age` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `sex` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `job` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `address` text COLLATE utf8mb4_unicode_ci,
  PRIMARY KEY (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

不设置主键uuid

4.2、创建Hive数据库 

hive建表
create database m31094;
drop table m31094.mm;
CREATE TABLE m31094.mm (
 `uuid` STRING COMMENT '主键',
 `name` STRING COMMENT '姓名',
 `time` STRING COMMENT '时间',
 `age` STRING COMMENT '年龄',
 `sex` STRING COMMENT '性别',
 `job` STRING COMMENT '工作',
 `address` STRING COMMENT '地址'
) COMMENT '美女表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

建表信息:

hive查询表描述信息

 DESCRIBE FORMATTED m31094.mm;
 

5、配置datax和datax-web

5.1、启动datax-web服务

启动命令

/cluster/datax-web-2.1.2/modules/datax-admin/bin/datax-admin.sh restart

tail -f /cluster/datax-web-2.1.2/modules/datax-admin/bin/console.out


/cluster/datax-web-2.1.2/modules/datax-executor/bin/datax-executor.sh restart
tail -f /cluster/datax-web-2.1.2/modules/datax-executor/bin/console.out

 查看进程的命令jps -l

kill进程的命令

sudo kill -9 $(ps -ef|grep datax|gawk '$0 !~/grep/ {print $2}' |tr -s '\n' ' ')

 datax-executor需要启动成功,可在页面查看自动注册的信息。

5.2、页面访问及配置

http://ip:6895/index.html 

6895是自定义端口,根据实际设置

登录用户名、密码:admin/123456

5.2.1创建项目 

5.2.2、添加mysql和hive的数据源

MySQL

HIVE 

5.2.3创建DataX任务模板

5.2.4、任务构建

步骤一、配置输入

步骤二、配置输出

步骤三、 字段映射

步骤四、构建、选择任务模板,复制JSON、下一步操作

生成JSON模板内容 

{
  "job": {
    "setting": {
      "speed": {
        "channel": 3,
        "byte": 1048576
      },
      "errorLimit": {
        "record": 0,
        "percentage": 0.02
      }
    },
    "content": [
      {
        "reader": {
          "name": "hdfsreader",
          "parameter": {
            "path": "/cluster/hive/warehouse/m31094.db/mm",
            "defaultFS": "hdfs://10.7.215.33:8020",
            "fileType": "text",
            "fieldDelimiter": ",",
            "skipHeader": false,
            "column": [
              {
                "index": "0",
                "type": "string"
              },
              {
                "index": "1",
                "type": "string"
              },
              {
                "index": "2",
                "type": "string"
              },
              {
                "index": "3",
                "type": "string"
              },
              {
                "index": "4",
                "type": "string"
              },
              {
                "index": "5",
                "type": "string"
              },
              {
                "index": "6",
                "type": "string"
              }
            ]
          }
        },
        "writer": {
          "name": "mysqlwriter",
          "parameter": {
            "username": "yRjwDFuoPKlqya9h9H2Amg==",
            "password": "XCYVpFosvZBBWobFzmLWvA==",
            "column": [
              "`uuid`",
              "`name`",
              "`time`",
              "`age`",
              "`sex`",
              "`job`",
              "`address`"
            ],
            "connection": [
              {
                "table": [
                  "mm"
                ],
                "jdbcUrl": "jdbc:mysql://10.7.215.33:3306/m31094"
              }
            ]
          }
        }
      }
    ]
  }
}

5.2.5、查询创建的任务

 5.2.6、在Hive中执行数据插入

insert into m31094.mm values('9','hive数据使用datax同步到MySQL',from_unixtime(unix_timestamp()),'1000000000090101','北京','新疆','加油');

 控制台输出内容:

6、查看效果

日志执行情况

2024-10-13 21:30:00 [JobThread.run-130] <br>----------- datax-web job execute start -----------<br>----------- Param:
2024-10-13 21:30:00 [BuildCommand.buildDataXParam-100] ------------------Command parameters:
2024-10-13 21:30:00 [ExecutorJobHandler.execute-57] ------------------DataX process id: 95006
2024-10-13 21:30:00 [ProcessCallbackThread.callbackLog-186] <br>----------- datax-web job callback finish.
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.588 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.597 [main] INFO  Engine - the machine info  => 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	osInfo:	Oracle Corporation 1.8 25.40-b25
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	jvmInfo:	Linux amd64 5.8.13-1.el7.elrepo.x86_64
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	cpu num:	8
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	totalPhysicalMemory:	-0.00G
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	freePhysicalMemory:	-0.00G
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	maxFileDescriptorCount:	-1
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	currentOpenFileDescriptorCount:	-1
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	GC Names	[PS MarkSweep, PS Scavenge]
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	MEMORY_NAME                    | allocation_size                | init_size                      
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	PS Eden Space                  | 256.00MB                       | 256.00MB                       
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	Code Cache                     | 240.00MB                       | 2.44MB                         
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	PS Survivor Space              | 42.50MB                        | 42.50MB                        
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	PS Old Gen                     | 683.00MB                       | 683.00MB                       
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	Metaspace                      | -0.00MB                        | 0.00MB                         
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.622 [main] INFO  Engine - 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] {
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	"content":[
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"reader":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				"name":"hdfsreader",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				"parameter":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"column":[
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"0",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"1",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"2",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"3",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"4",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"5",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"index":"6",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"type":"string"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					],
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"defaultFS":"hdfs://10.7.215.33:8020",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"fieldDelimiter":",",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"fileType":"text",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"path":"hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"skipHeader":false
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"writer":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				"name":"mysqlwriter",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				"parameter":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"column":[
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`uuid`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`name`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`time`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`age`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`sex`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`job`",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						"`address`"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					],
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"connection":[
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"jdbcUrl":"jdbc:mysql://10.7.215.33:3306/m31094",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							"table":[
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 								"mm"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 							]
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 						}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					],
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"password":"******",
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 					"username":"root"
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 				}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	],
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	"setting":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		"errorLimit":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"percentage":0.02,
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"record":0
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		},
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		"speed":{
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"byte":1048576,
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 			"channel":3
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 		}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 	}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] }
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.645 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.647 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.648 [main] INFO  JobContainer - DataX jobContainer starts job.
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.650 [main] INFO  JobContainer - Set jobId = 0
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.667 [job-0] INFO  HdfsReader$Job - init() begin...
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.993 [job-0] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":[]}
2024-10-13 21:30:00 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:00.994 [job-0] INFO  HdfsReader$Job - init() ok and end...
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.580 [job-0] INFO  OriginalConfPretreatmentUtil - table:[mm] all columns:[
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] uuid,name,time,age,sex,job,address
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] ].
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.613 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] INSERT INTO %s (`uuid`,`name`,`time`,`age`,`sex`,`job`,`address`) VALUES(?,?,?,?,?,?,?)
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] ], which jdbcUrl like:[jdbc:mysql://10.7.215.33:3306/m31094?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  JobContainer - DataX Reader.Job [hdfsreader] do prepare work .
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  HdfsReader$Job - prepare(), start to getAllFiles...
2024-10-13 21:30:01 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:01.614 [job-0] INFO  HdfsReader$Job - get HDFS all files in path = [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm]
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.699 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.709 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_1]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.718 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_2]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.728 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_3]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.737 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_4]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.759 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_5]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.771 [job-0] INFO  HdfsReader$Job - [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_6]是[text]类型的文件, 将该文件加入source files列表
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.771 [job-0] INFO  HdfsReader$Job - 您即将读取的文件数为: [7], 列表为: [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_5,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_4,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_3,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_2,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_1,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0,hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_6]
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.772 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.773 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.774 [job-0] INFO  JobContainer - Job set Max-Byte-Speed to 1048576 bytes.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.775 [job-0] INFO  HdfsReader$Job - split() begin...
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.777 [job-0] INFO  JobContainer - DataX Reader.Job [hdfsreader] splits to [7] tasks.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.779 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [7] tasks.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.797 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.810 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.813 [job-0] INFO  JobContainer - Running by standalone Mode.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.825 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [7] tasks.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.830 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.831 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.845 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] attemptCount[1] is started
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.883 [0-0-2-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.885 [0-0-2-reader] INFO  Reader$Task - read start
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.886 [0-0-2-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_3]
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.918 [0-0-2-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:02 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:02.925 [0-0-2-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.247 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[2] is successed, used[403]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.250 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[5] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.286 [0-0-5-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.287 [0-0-5-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.287 [0-0-5-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.290 [0-0-5-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.292 [0-0-5-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.351 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[5] is successed, used[101]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.354 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.379 [0-0-4-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.380 [0-0-4-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.380 [0-0-4-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_1]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.384 [0-0-4-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.386 [0-0-4-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.454 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[4] is successed, used[101]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.457 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.486 [0-0-0-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.487 [0-0-0-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.487 [0-0-0-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_5]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.489 [0-0-0-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.491 [0-0-0-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.558 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[101]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.561 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.587 [0-0-1-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.588 [0-0-1-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.588 [0-0-1-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_4]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.592 [0-0-1-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.594 [0-0-1-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.662 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[1] is successed, used[101]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.664 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.691 [0-0-3-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.691 [0-0-3-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.691 [0-0-3-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_2]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.694 [0-0-3-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.696 [0-0-3-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.765 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[3] is successed, used[101]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.768 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[6] attemptCount[1] is started
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.791 [0-0-6-reader] INFO  HdfsReader$Job - hadoopConfig details:{"finalParameters":["mapreduce.job.end-notification.max.retry.interval","mapreduce.job.end-notification.max.attempts"]}
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.791 [0-0-6-reader] INFO  Reader$Task - read start
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.791 [0-0-6-reader] INFO  Reader$Task - reading file : [hdfs://10.7.215.33:8020/cluster/hive/warehouse/m31094.db/mm/000000_0_copy_6]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.795 [0-0-6-reader] INFO  UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":",","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.798 [0-0-6-reader] INFO  Reader$Task - end read source files...
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.868 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[6] is successed, used[100]ms
2024-10-13 21:30:03 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:03.869 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.838 [job-0] INFO  StandAloneJobContainerCommunicator - Total 7 records, 282 bytes | Speed 28B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.028s | Percentage 100.00%
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.838 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.838 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.839 [job-0] INFO  JobContainer - DataX Reader.Job [hdfsreader] do post work.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.839 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.840 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /cluster/datax/hook
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.841 [job-0] INFO  JobContainer - 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 	 [total cpu info] => 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 		averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 		-1.00%                         | -1.00%                         | -1.00%
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53]                         
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 	 [total gc info] => 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 		 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 		 PS MarkSweep         | 1                  | 1                  | 1                  | 0.040s             | 0.040s             | 0.040s             
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 		 PS Scavenge          | 1                  | 1                  | 1                  | 0.022s             | 0.022s             | 0.022s             
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.841 [job-0] INFO  JobContainer - PerfTrace not enable!
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.842 [job-0] INFO  StandAloneJobContainerCommunicator - Total 7 records, 282 bytes | Speed 28B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.028s | Percentage 100.00%
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 2024-10-13 21:30:12.843 [job-0] INFO  JobContainer - 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务启动时刻                    : 2024-10-13 21:30:00
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务结束时刻                    : 2024-10-13 21:30:12
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务总计耗时                    :                 12s
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 任务平均流量                    :               28B/s
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 记录写入速度                    :              0rec/s
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 读出记录总数                    :                   7
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 读写失败总数                    :                   0
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 十月 13, 2024 9:30:01 下午 org.apache.hadoop.util.NativeCodeLoader <clinit>
2024-10-13 21:30:12 [AnalysisStatistics.analysisStatisticsLog-53] 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-10-13 21:30:12 [JobThread.run-165] <br>----------- datax-web job execute end(finish) -----------<br>----------- ReturnT:ReturnT [code=200, msg=LogStatistics{taskStartTime=2024-10-13 21:30:00, taskEndTime=2024-10-13 21:30:12, taskTotalTime=12s, taskAverageFlow=28B/s, taskRecordWritingSpeed=0rec/s, taskRecordReaderNum=7, taskRecordWriteFailNum=0}, content=null]
2024-10-13 21:30:12 [TriggerCallbackThread.callbackLog-186] <br>----------- datax-web job callback finish.

 查询MySQL数据库查询数据同步

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.rhkb.cn/news/446412.html

如若内容造成侵权/违法违规/事实不符,请联系长河编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

「Ubuntu」文件权限说明(drwxr-xr-x)

我们在使用Ubuntu 查看文件信息时&#xff0c;常常使用 ll 命令查看&#xff0c;但是输出的详细信息有些复杂&#xff0c;特别是 类似与 drwxr-xr-x 的字符串&#xff0c;在此进行详细解释下 属主&#xff1a;所属用户 属组&#xff1a;文件所属组别 drwxr-xr-x 7 apps root 4…

Pytorch基础:设置随机种子

相关阅读 Pytorch基础https://blog.csdn.net/weixin_45791458/category_12457644.html?spm1001.2014.3001.5482 有时候&#xff0c;如果需要代码在多个运行中具有可重复性&#xff0c;可以通过以下方式来设置随机种子&#xff1a; import torch import numpy as np import r…

【亲测可行】最新ubuntu搭建rknn-toolkit2

文章目录 🌕结构图(ONNX->RKNN)🌕下载rknn-toolkit2🌕搭建环境🌙配置镜像源🌙conda搭建python3.8版本的虚拟环境🌙进入packages目录安装依赖库🌕测试安装是否成功🌕其它🌙rknn-toolkit2🌙rknn_model_zoo🌙关于部署的博客发布本文的时间为2024.10.13…

【进阶OpenCV】 (11)--DNN板块--实现风格迁移

文章目录 DNN板块一、DNN特点二、DNN函数流程三、实现风格迁移1. 图像预处理2. 加载星空模型3. 输出处理 总结 DNN板块 DNN模块是 OpenCV 中专门用来实现 DNN(Deep Neural Networks,深度神经网络) 模块的相关功能&#xff0c;其作用是载入别的深度学习框架(如 TensorFlow、Caf…

【微信小程序_11_全局配置】

摘要:本文介绍了微信小程序全局配置文件 app.json 中的常用配置项,重点阐述了 window 节点的各项配置,包括导航栏标题文字、背景色、标题颜色,窗口背景色、下拉刷新样式以及上拉触底距离等。通过这些配置可实现小程序窗口外观的个性化设置,提升用户体验。 微信小程序_11_全…

如何成为 Rust 核心贡献者?Rust 开发的核​​心是什么?Rust 重要技术专家揭秘

10 月 17 - 18日&#xff0c;由 GOSIM 开源创新汇主办、CSDN 承办的 GOSIM CHINA 2024 将在北京盛大启幕。作为 GOSIM 开源年度大会的第三届盛会&#xff0c;本次活动邀请了 60 多位国际开源专家&#xff0c;汇聚了来自全球百余家顶尖科技企业、知名高校及开源社区的技术大咖、…

回溯法与迭代法详解:如何从手机数字键盘生成字母组合

在这篇文章中&#xff0c;我们将详细介绍如何基于手机数字键盘的映射&#xff0c;给定一个仅包含数字 2-9 的字符串&#xff0c;输出它能够表示的所有字母组合。这是一个经典的回溯算法问题&#xff0c;适合初学者理解和掌握。 问题描述 给定一个数字字符串&#xff0c;比如 …

python基础路径的迁移

本人未安装anaconda或pycharm等&#xff0c;仅安装了某个python环境&#xff0c;因此以下方法仅针对基础python环境的迁移&#xff0c;不确保其他软件或插件正常运行 第一步将原python路径的整个文件夹剪切到新的路径下 第二步修改系统环境变量&#xff0c;将原来的python路径…

php 生成随机数

记录&#xff1a;随机数抽奖 要求&#xff1a;每次生成3个 1 - 10 之间可重复&#xff08;或不可重复&#xff09;的随机数&#xff0c;10次为一轮&#xff0c;每轮要求数字5出现6次、数字4出现3次、…。 提炼需求&#xff1a; 1&#xff0c;可设置最小数、最大数、每次抽奖生…

鸿蒙--商品列表

这里主要利用的是 List 组件 相关概念 Scroll:可滚动的容器组件,当子组件的布局尺寸超过父组件的视口时,内容可以滚动。List:列表包

AI+若依框架day02

项目实战 项目介绍 帝可得是什么 角色和功能 页面原型 库表设计 初始AI AIGC 提示工程 Prompt的组成 Prompt练习 项目搭建 点位管理 需求说明 库表设计

浏览器中使用模型

LLM 参数越来越小&#xff0c;使模型跑在端侧成为可能&#xff0c;为什么要模型跑在端侧呢&#xff0c;首先可以节省服务器的算力&#xff0c;现在 GPU 的租用价格还是比较的高的&#xff0c;例如租用一个 A10 的卡1 年都要 3 万多。如果将一部分算力转移到端侧通过小模型进行计…

【LeetCode热题100】分治-快排

本篇博客记录分治快排的4道题目&#xff1a;颜色分类、排序数组、数组中的第K个最大元素、数组中最小的N个元素&#xff08;库存管理&#xff09;。 class Solution { public:void sortColors(vector<int>& nums) {int n nums.size();int left -1,right n;for(int…

React速成

useRef获取DOM 组件通讯 子传父 function Son({ onGetMsg }){const sonMsg this is son msgreturn (<div>{/* 在子组件中执行父组件传递过来的函数 */}<button onClick{()>onGetMsg(sonMsg)}>send</button></div>) }function App(){const getMsg…

Python基础常见面试题总结

文章目录 1.深拷贝与浅拷贝2.迭代器3.生成器4.装饰器5.进程、线程、协程6.高阶函数7.魔法方法8.python垃圾回收机制 1.深拷贝与浅拷贝 浅拷贝是对地址的拷贝&#xff0c;只拷贝第一层&#xff0c;第一层改变的时候不会改变&#xff0c;内层改变才会改变。深拷贝是对值的拷贝&a…

智能驾驶|迈向智能出行未来,AI如何应用在自动驾驶?

自动驾驶通过人工智能&#xff08;AI&#xff09;、机器学习、传感器融合和实时数据处理&#xff0c;使车辆能够在无需人类干预的情况下自主驾驶。随着科技的飞速发展&#xff0c;人工智能&#xff08;AI&#xff09;与智能汽车的结合正在成为现代交通运输领域的热潮。无人驾驶…

selenium-Alert类用于操作提示框/确认弹框(4)

之前文章我们提到&#xff0c;在webdriver.WebDriver类有一个switch_to方法&#xff0c;通过switch_to.alert()可以返回Alert对象&#xff0c;而Alert对象主要用于网页中弹出的提示框/确认框/文本输入框的确认或者取消等动作。 Alert介绍 当在页面定位到提示框/确认框/文本录入…

Flythings学习(二)控件相关

文章目录 1 前言2 通用属性2.1 控件ID值2.2 控件位置2.3 背景色2.4 背景图2.5 显示与隐藏2.6 控件状态2.7 蜂鸣器控制 3 文本类TextView4 按键类 Button4.1 系统按键4.2 处理按钮长按事件4.3 处理按键触摸事件 5 复选框CheckBox6 单选组 RadioGroup7 进度条&#xff0c;滑块7.1…

Ubuntu卸载Mysql【ubuntu 24.04/mysql 8.0.39】

一、准备工作 查看ubuntu版本号 查看mysql版本号(如果没有安装mysql,这一步省略) 二、Ubuntu上卸载mysql(如果没有安装mysql这一步省略) 在Ubuntu上卸载MySQL可以通过以下步骤进行&#xff0c;确保完全移除MySQL相关的包和数据&#xff1a; 1. 停止MySQL服务 在卸载之前…

verilog端口使用注意事项

下图存在组合逻辑反馈环&#xff0c;即组合逻辑的输出反馈到输入(赋值的左右2边存在相同的信号)&#xff0c;此种情况会造成系统不稳定。比如在data_in20的情况下&#xff0c;在data_out0 时候&#xff0c;输出的数据会反馈到输入&#xff0c;输入再输出&#xff0c;从而造成不…