项目简介:
小李哥将继续每天介绍一个基于亚马逊云科技AWS云计算平台的全球前沿AI技术解决方案,帮助大家快速了解国际上最热门的云计算平台亚马逊云科技AWS AI最佳实践,并应用到自己的日常工作里。
本次介绍的是如何在亚马逊云科技利用Service Catalog服务创建和管理包含AI大模型的应用产品,并通过权限管理基于员工的身份职责限制所能访问的云资源,并创建SageMaker机器学习托管服务并在该服务上训练和部署大模型,通过VPC endpoint节点私密、安全的加载模型文件和模型容器镜像。本架构设计全部采用了云原生Serverless架构,提供可扩展和安全的AI解决方案。本方案的解决方案架构图如下:
方案所需基础知识
什么是 Amazon SageMaker?
Amazon SageMaker 是亚马逊云科技提供的一站式机器学习服务,旨在帮助开发者和数据科学家轻松构建、训练和部署机器学习模型。SageMaker 提供了从数据准备、模型训练到模型部署的全流程工具,使用户能够高效地在云端实现机器学习项目。
什么是亚马逊云科技 Service Catalog?
亚马逊云科技 Service Catalog 是一项服务,旨在帮助企业创建、管理和分发经过批准的云服务集合。通过 Service Catalog,企业可以集中管理已批准的资源和配置,确保开发团队在使用云服务时遵循组织的最佳实践和合规要求。用户可以从预定义的产品目录中选择所需的服务,简化了资源部署的过程,并减少了因配置错误导致的风险。
利用 SageMaker 构建 AI 服务的安全合规好处
符合企业合规性要求:
使用 SageMaker 构建 AI 服务时,可以通过 Service Catalog 预先定义和管理符合公司合规标准的配置模板,确保所有的 AI 模型和资源部署都遵循组织的安全政策和行业法规,如 GDPR 或 HIPAA。
数据安全性:
SageMaker 提供了端到端的数据加密选项,包括在数据存储和传输中的加密,确保敏感数据在整个 AI 模型生命周期中的安全性。同时可以利用VPC endpoint节点,私密安全的访问S3中的数据,加载ECR镜像库中保存的AI模型镜像容器。
访问控制和监控:
通过与亚马逊云科技的身份和访问管理(IAM)集成,可以细粒度地控制谁可以访问和操作 SageMaker 中的资源。再结合 CloudTrail 和 CloudWatch 等监控工具,企业可以实时跟踪和审计所有的操作,确保透明度和安全性。
本方案包括的内容
1. 通过VPC Endpoint节点,私有访问S3中的模型文件
2. 创建亚马逊云科技Service Catalog资源组,统一创建、管理用户的云服务产品。
3. 作为Service Catalog的使用用户创建一个SageMaker机器学习训练计算实例
项目搭建具体步骤:
1. 登录亚马逊云科技控制台,进入无服务器计算服务Lambda,创建一个Lambda函数“SageMakerBuild”,复制以下代码,用于创建SageMaker Jupyter Notebook,训练AI大模型。
import json
import boto3
import requests
import botocore
import time
import base64## Request Status ##
global ReqStatusdef CFTFailedResponse(event, status, message):print("Inside CFTFailedResponse")responseBody = {'Status': status,'Reason': message,'PhysicalResourceId': event['ServiceToken'],'StackId': event['StackId'],'RequestId': event['RequestId'],'LogicalResourceId': event['LogicalResourceId']}headers={'content-type':'','content-length':str(len(json.dumps(responseBody))) } print('Response = ' + json.dumps(responseBody))try: req=requests.put(event['ResponseURL'], data=json.dumps(responseBody),headers=headers)print("delete_respond_cloudformation res "+str(req)) except Exception as e:print("Failed to send cf response {}".format(e))def CFTSuccessResponse(event, status, data=None):responseBody = {'Status': status,'Reason': 'See the details in CloudWatch Log Stream','PhysicalResourceId': event['ServiceToken'],'StackId': event['StackId'],'RequestId': event['RequestId'],'LogicalResourceId': event['LogicalResourceId'],'Data': data}headers={'content-type':'','content-length':str(len(json.dumps(responseBody))) } print('Response = ' + json.dumps(responseBody))#print(event)try: req=requests.put(event['ResponseURL'], data=json.dumps(responseBody),headers=headers)except Exception as e:print("Failed to send cf response {}".format(e))def lambda_handler(event, context):ReqStatus = "SUCCESS"print("Event:")print(event)client = boto3.client('sagemaker')ec2client = boto3.client('ec2')data = {}if event['RequestType'] == 'Create':try:## Value Intialization from CFT ##project_name = event['ResourceProperties']['ProjectName']kmsKeyId = event['ResourceProperties']['KmsKeyId']Tags = event['ResourceProperties']['Tags']env_name = event['ResourceProperties']['ENVName']subnet_name = event['ResourceProperties']['Subnet']security_group_name = event['ResourceProperties']['SecurityGroupName']input_dict = {}input_dict['NotebookInstanceName'] = event['ResourceProperties']['NotebookInstanceName']input_dict['InstanceType'] = event['ResourceProperties']['NotebookInstanceType']input_dict['Tags'] = event['ResourceProperties']['Tags']input_dict['DirectInternetAccess'] = event['ResourceProperties']['DirectInternetAccess']input_dict['RootAccess'] = event['ResourceProperties']['RootAccess']input_dict['VolumeSizeInGB'] = int(event['ResourceProperties']['VolumeSizeInGB'])input_dict['RoleArn'] = event['ResourceProperties']['RoleArn']input_dict['LifecycleConfigName'] = event['ResourceProperties']['LifecycleConfigName']except Exception as e:print(e)ReqStatus = "FAILED"message = "Parameter Error: "+str(e)CFTFailedResponse(event, "FAILED", message)if ReqStatus == "FAILED":return None;print("Validating Environment name: "+env_name)print("Subnet Id Fetching.....")try:## Sagemaker Subnet ##subnetName = env_name+"-ResourceSubnet"print(subnetName)response = ec2client.describe_subnets(Filters=[{'Name': 'tag:Name','Values': [subnet_name]},])#print(response)subnetId = response['Subnets'][0]['SubnetId']input_dict['SubnetId'] = subnetIdprint("Desc sg done!!")except Exception as e:print(e)ReqStatus = "FAILED"message = " Project Name is invalid - Subnet Error: "+str(e)CFTFailedResponse(event, "FAILED", message)if ReqStatus == "FAILED":return None;## Sagemaker Security group ##print("Security GroupId Fetching.....")try:sgName = env_name+"-ResourceSG"response = ec2client.describe_security_groups(Filters=[{'Name': 'tag:Name','Values': [security_group_name]},])sgId = response['SecurityGroups'][0]['GroupId']input_dict['SecurityGroupIds'] = [sgId]print("Desc sg done!!")except Exception as e:print(e)ReqStatus = "FAILED"message = "Security Group ID Error: "+str(e)CFTFailedResponse(event, "FAILED", message)if ReqStatus == "FAILED":return None; try:if kmsKeyId:input_dict['KmsKeyId'] = kmsKeyIdelse:print("in else")print(input_dict)instance = client.create_notebook_instance(**input_dict)print('Sagemager CLI response')print(str(instance))responseData = {'NotebookInstanceArn': instance['NotebookInstanceArn']}NotebookStatus = 'Pending'response = client.describe_notebook_instance(NotebookInstanceName=event['ResourceProperties']['NotebookInstanceName'])NotebookStatus = response['NotebookInstanceStatus']print("NotebookStatus:"+NotebookStatus)## Notebook Failure ##if NotebookStatus == 'Failed':message = NotebookStatus+": "+response['FailureReason']+" :Notebook is not coming InService"CFTFailedResponse(event, "FAILED", message)else:while NotebookStatus == 'Pending':time.sleep(200)response = client.describe_notebook_instance(NotebookInstanceName=event['ResourceProperties']['NotebookInstanceName'])NotebookStatus = response['NotebookInstanceStatus']print("NotebookStatus in loop:"+NotebookStatus)## Notebook Success ##if NotebookStatus == 'InService':data['Message'] = "SageMaker Notebook name - "+event['ResourceProperties']['NotebookInstanceName']+" created succesfully"print("message InService :",data['Message'])CFTSuccessResponse(event, "SUCCESS", data)else:message = NotebookStatus+": "+response['FailureReason']+" :Notebook is not coming InService"print("message :",message)CFTFailedResponse(event, "FAILED", message)except Exception as e:print(e)ReqStatus = "FAILED"CFTFailedResponse(event, "FAILED", str(e))if event['RequestType'] == 'Delete':NotebookStatus = Nonelifecycle_config = event['ResourceProperties']['LifecycleConfigName']NotebookName = event['ResourceProperties']['NotebookInstanceName']try:response = client.describe_notebook_instance(NotebookInstanceName=NotebookName)NotebookStatus = response['NotebookInstanceStatus']print("Notebook Status - "+NotebookStatus)except Exception as e:print(e)NotebookStatus = "Invalid"#CFTFailedResponse(event, "FAILED", str(e))while NotebookStatus == 'Pending':time.sleep(30)response = client.describe_notebook_instance(NotebookInstanceName=NotebookName)NotebookStatus = response['NotebookInstanceStatus']print("NotebookStatus:"+NotebookStatus)if NotebookStatus != 'Failed' and NotebookStatus != 'Invalid' :print("Delete request for Notebookk name: "+NotebookName)print("Stoping the Notebook.....")if NotebookStatus != 'Stopped':try:response = client.stop_notebook_instance(NotebookInstanceName=NotebookName)NotebookStatus = 'Stopping'print("Notebook Status - "+NotebookStatus)while NotebookStatus == 'Stopping':time.sleep(30)response = client.describe_notebook_instance(NotebookInstanceName=NotebookName)NotebookStatus = response['NotebookInstanceStatus']print("NotebookStatus:"+NotebookStatus)except Exception as e:print(e)NotebookStatus = "Invalid"CFTFailedResponse(event, "FAILED", str(e))else:NotebookStatus = 'Stopped'print("NotebookStatus:"+NotebookStatus)if NotebookStatus != 'Invalid':print("Deleting The Notebook......")time.sleep(5)try:response = client.delete_notebook_instance(NotebookInstanceName=NotebookName)print("Notebook Deleted")data["Message"] = "Notebook Deleted"CFTSuccessResponse(event, "SUCCESS", data)except Exception as e:print(e)CFTFailedResponse(event, "FAILED", str(e))else:print("Notebook Invalid status")data["Message"] = "Notebook is not available"CFTSuccessResponse(event, "SUCCESS", data)if event['RequestType'] == 'Update':print("Update operation for Sagemaker Notebook is not recommended")data["Message"] = "Update operation for Sagemaker Notebook is not recommended"CFTSuccessResponse(event, "SUCCESS", data)
2. 接下来我们创建一个yaml脚本,复制以下代码,上传到S3桶中,用于通过CloudFormation,以IaC的形式创建SageMaker Jupyter Notebook。
AWSTemplateFormatVersion: 2010-09-09
Description: Template to create a SageMaker notebook
Metadata:'AWS::CloudFormation::Interface':ParameterGroups:- Label:default: Environment detailParameters:- ENVName- Label:default: SageMaker Notebook configurationParameters:- NotebookInstanceName- NotebookInstanceType- DirectInternetAccess- RootAccess- VolumeSizeInGB- Label:default: Load S3 Bucket to SageMakerParameters:- S3CodePusher- CodeBucketName- Label:default: Project detailParameters:- ProjectName- ProjectIDParameterLabels:DirectInternetAccess:default: Default Internet AccessNotebookInstanceName:default: Notebook Instance NameNotebookInstanceType:default: Notebook Instance TypeENVName:default: Environment NameProjectName:default: Project SuffixRootAccess:default: Root accessVolumeSizeInGB:default: Volume size for the SageMaker NotebookProjectID:default: SageMaker ProjectIDCodeBucketName:default: Code Bucket Name S3CodePusher:default: Copy code from S3 to SageMaker
Parameters:SubnetName:Default: ProSM-ResourceSubnetDescription: Subnet Random StringType: StringSecurityGroupName:Default: ProSM-ResourceSGDescription: Security Group NameType: StringSageMakerBuildFunctionARN:Description: Service Token Value passed from Lambda StackType: StringNotebookInstanceName:AllowedPattern: '[A-Za-z0-9-]{1,63}'ConstraintDescription: >-Maximum of 63 alphanumeric characters. Can include hyphens (-), but notspaces. Must be unique within your account in an AWS Region.Description: SageMaker Notebook instance nameMaxLength: '63'MinLength: '1'Type: StringNotebookInstanceType:ConstraintDescription: Must select a valid notebook instance type.Default: ml.t3.mediumDescription: Select Instance type for the SageMaker NotebookType: StringENVName:Description: SageMaker infrastructure naming conventionType: StringProjectName:Description: >-The suffix appended to all resources in the stack. This will allowmultiple copies of the same stack to be created in the same account.Type: StringRootAccess:Description: Root access for the SageMaker Notebook userAllowedValues:- Enabled- DisabledDefault: EnabledType: StringVolumeSizeInGB:Description: >-The size, in GB, of the ML storage volume to attach to the notebookinstance. The default value is 5 GB.Type: NumberDefault: '20'DirectInternetAccess:Description: >-If you set this to Disabled this notebook instance will be able to accessresources only in your VPC. As per the Project requirement, we haveDisabled it.Type: StringDefault: DisabledAllowedValues:- DisabledConstraintDescription: Must select a valid notebook instance type.ProjectID:Type: StringDescription: Enter a valid ProjectID.Default: QuickStart007S3CodePusher:Description: Do you want to load the code from S3 to SageMaker NotebookDefault: 'NO'AllowedValues:- 'YES'- 'NO'Type: StringCodeBucketName:Description: S3 Bucket name from which you want to copy the code to SageMaker.Default: lab-materials-bucket-1234Type: String
Conditions:BucketCondition: !Equals - 'YES'- !Ref S3CodePusher
Resources:SagemakerKMSKey:Type: 'AWS::KMS::Key'Properties:EnableKeyRotation: trueTags:- Key: ProjectIDValue: !Ref ProjectID- Key: ProjectNameValue: !Ref ProjectNameKeyPolicy:Version: '2012-10-17'Statement:- Effect: AllowPrincipal:AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root'Action: - 'kms:Encrypt'- 'kms:PutKeyPolicy' - 'kms:CreateKey' - 'kms:GetKeyRotationStatus' - 'kms:DeleteImportedKeyMaterial' - 'kms:GetKeyPolicy' - 'kms:UpdateCustomKeyStore' - 'kms:GenerateRandom' - 'kms:UpdateAlias'- 'kms:ImportKeyMaterial'- 'kms:ListRetirableGrants' - 'kms:CreateGrant' - 'kms:DeleteAlias'- 'kms:RetireGrant'- 'kms:ScheduleKeyDeletion' - 'kms:DisableKeyRotation' - 'kms:TagResource' - 'kms:CreateAlias' - 'kms:EnableKeyRotation' - 'kms:DisableKey'- 'kms:ListResourceTags'- 'kms:Verify' - 'kms:DeleteCustomKeyStore'- 'kms:Sign' - 'kms:ListKeys'- 'kms:ListGrants'- 'kms:ListAliases' - 'kms:ReEncryptTo' - 'kms:UntagResource' - 'kms:GetParametersForImport'- 'kms:ListKeyPolicies'- 'kms:GenerateDataKeyPair'- 'kms:GenerateDataKeyPairWithoutPlaintext' - 'kms:GetPublicKey' - 'kms:Decrypt' - 'kms:ReEncryptFrom'- 'kms:DisconnectCustomKeyStore' - 'kms:DescribeKey'- 'kms:GenerateDataKeyWithoutPlaintext'- 'kms:DescribeCustomKeyStores' - 'kms:CreateCustomKeyStore'- 'kms:EnableKey'- 'kms:RevokeGrant'- 'kms:UpdateKeyDescription' - 'kms:ConnectCustomKeyStore' - 'kms:CancelKeyDeletion' - 'kms:GenerateDataKey'Resource:- !Join - ''- - 'arn:aws:kms:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':key/*'- Sid: Allow access for Key AdministratorsEffect: AllowPrincipal:AWS: - !GetAtt SageMakerExecutionRole.ArnAction:- 'kms:CreateAlias'- 'kms:CreateKey'- 'kms:CreateGrant' - 'kms:CreateCustomKeyStore'- 'kms:DescribeKey'- 'kms:DescribeCustomKeyStores'- 'kms:EnableKey'- 'kms:EnableKeyRotation'- 'kms:ListKeys'- 'kms:ListAliases'- 'kms:ListKeyPolicies'- 'kms:ListGrants'- 'kms:ListRetirableGrants'- 'kms:ListResourceTags'- 'kms:PutKeyPolicy'- 'kms:UpdateAlias'- 'kms:UpdateKeyDescription'- 'kms:UpdateCustomKeyStore'- 'kms:RevokeGrant'- 'kms:DisableKey'- 'kms:DisableKeyRotation'- 'kms:GetPublicKey'- 'kms:GetKeyRotationStatus'- 'kms:GetKeyPolicy'- 'kms:GetParametersForImport'- 'kms:DeleteCustomKeyStore'- 'kms:DeleteImportedKeyMaterial'- 'kms:DeleteAlias'- 'kms:TagResource'- 'kms:UntagResource'- 'kms:ScheduleKeyDeletion'- 'kms:CancelKeyDeletion'Resource:- !Join - ''- - 'arn:aws:kms:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':key/*'- Sid: Allow use of the keyEffect: AllowPrincipal:AWS: - !GetAtt SageMakerExecutionRole.ArnAction:- kms:Encrypt- kms:Decrypt- kms:ReEncryptTo- kms:ReEncryptFrom- kms:GenerateDataKeyPair- kms:GenerateDataKeyPairWithoutPlaintext- kms:GenerateDataKeyWithoutPlaintext- kms:GenerateDataKey- kms:DescribeKeyResource:- !Join - ''- - 'arn:aws:kms:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':key/*'- Sid: Allow attachment of persistent resourcesEffect: AllowPrincipal:AWS: - !GetAtt SageMakerExecutionRole.ArnAction:- kms:CreateGrant- kms:ListGrants- kms:RevokeGrantResource:- !Join - ''- - 'arn:aws:kms:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':key/*'Condition:Bool:kms:GrantIsForAWSResource: 'true'KeyAlias:Type: AWS::KMS::AliasProperties:AliasName: 'alias/SageMaker-CMK-DS'TargetKeyId:Ref: SagemakerKMSKeySageMakerExecutionRole:Type: 'AWS::IAM::Role'Properties:Tags:- Key: ProjectIDValue: !Ref ProjectID- Key: ProjectNameValue: !Ref ProjectNameAssumeRolePolicyDocument:Statement:- Effect: AllowPrincipal:Service:- sagemaker.amazonaws.comAction:- 'sts:AssumeRole'Path: /Policies:- PolicyName: !Join - ''- - !Ref ProjectName- SageMakerExecutionPolicyPolicyDocument:Version: 2012-10-17Statement:- Effect: AllowAction:- 'iam:ListRoles'Resource:- !Join - ''- - 'arn:aws:iam::'- !Ref 'AWS::AccountId'- ':role/*'- Sid: CloudArnResourceEffect: AllowAction:- 'application-autoscaling:DeleteScalingPolicy'- 'application-autoscaling:DeleteScheduledAction'- 'application-autoscaling:DeregisterScalableTarget'- 'application-autoscaling:DescribeScalableTargets'- 'application-autoscaling:DescribeScalingActivities'- 'application-autoscaling:DescribeScalingPolicies'- 'application-autoscaling:DescribeScheduledActions'- 'application-autoscaling:PutScalingPolicy'- 'application-autoscaling:PutScheduledAction'- 'application-autoscaling:RegisterScalableTarget'Resource:- !Join - ''- - 'arn:aws:autoscaling:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':*'- Sid: ElasticArnResourceEffect: AllowAction:- 'elastic-inference:Connect'Resource:- !Join - ''- - 'arn:aws:elastic-inference:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':elastic-inference-accelerator/*' - Sid: SNSArnResourceEffect: AllowAction:- 'sns:ListTopics'Resource:- !Join - ''- - 'arn:aws:sns:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':*'- Sid: logsArnResourceEffect: AllowAction:- 'cloudwatch:DeleteAlarms'- 'cloudwatch:DescribeAlarms'- 'cloudwatch:GetMetricData'- 'cloudwatch:GetMetricStatistics'- 'cloudwatch:ListMetrics'- 'cloudwatch:PutMetricAlarm'- 'cloudwatch:PutMetricData'- 'logs:CreateLogGroup'- 'logs:CreateLogStream'- 'logs:DescribeLogStreams'- 'logs:GetLogEvents'- 'logs:PutLogEvents'Resource:- !Join - ''- - 'arn:aws:logs:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':log-group:/aws/lambda/*'- Sid: KmsArnResourceEffect: AllowAction:- 'kms:DescribeKey'- 'kms:ListAliases'Resource:- !Join - ''- - 'arn:aws:kms:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':key/*'- Sid: ECRArnResourceEffect: AllowAction:- 'ecr:BatchCheckLayerAvailability'- 'ecr:BatchGetImage'- 'ecr:CreateRepository'- 'ecr:GetAuthorizationToken'- 'ecr:GetDownloadUrlForLayer'- 'ecr:DescribeRepositories'- 'ecr:DescribeImageScanFindings'- 'ecr:DescribeRegistry'- 'ecr:DescribeImages'Resource:- !Join - ''- - 'arn:aws:ecr:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':repository/*'- Sid: EC2ArnResourceEffect: AllowAction: - 'ec2:CreateNetworkInterface'- 'ec2:CreateNetworkInterfacePermission'- 'ec2:DeleteNetworkInterface'- 'ec2:DeleteNetworkInterfacePermission'- 'ec2:DescribeDhcpOptions'- 'ec2:DescribeNetworkInterfaces'- 'ec2:DescribeRouteTables'- 'ec2:DescribeSecurityGroups'- 'ec2:DescribeSubnets'- 'ec2:DescribeVpcEndpoints'- 'ec2:DescribeVpcs'Resource:- !Join - ''- - 'arn:aws:ec2:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':instance/*'- Sid: S3ArnResourceEffect: AllowAction: - 's3:CreateBucket'- 's3:GetBucketLocation'- 's3:ListBucket' Resource:- !Join - ''- - 'arn:aws:s3::'- ':*sagemaker*' - Sid: LambdaInvokePermissionEffect: AllowAction:- 'lambda:ListFunctions'Resource:- !Join - ''- - 'arn:aws:lambda:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':function'- ':*'- Effect: AllowAction: 'sagemaker:InvokeEndpoint'Resource:- !Join - ''- - 'arn:aws:sagemaker:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':notebook-instance-lifecycle-config/*'Condition:StringEquals:'aws:PrincipalTag/ProjectID': !Ref ProjectID- Effect: AllowAction:- 'sagemaker:CreateTrainingJob'- 'sagemaker:CreateEndpoint'- 'sagemaker:CreateModel'- 'sagemaker:CreateEndpointConfig'- 'sagemaker:CreateHyperParameterTuningJob'- 'sagemaker:CreateTransformJob'Resource:- !Join - ''- - 'arn:aws:sagemaker:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':notebook-instance-lifecycle-config/*'Condition:StringEquals:'aws:PrincipalTag/ProjectID': !Ref ProjectID'ForAllValues:StringEquals':'aws:TagKeys':- Username- Effect: AllowAction:- 'sagemaker:DescribeTrainingJob'- 'sagemaker:DescribeEndpoint'- 'sagemaker:DescribeEndpointConfig'Resource:- !Join - ''- - 'arn:aws:sagemaker:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':notebook-instance-lifecycle-config/*'Condition:StringEquals:'aws:PrincipalTag/ProjectID': !Ref ProjectID- Effect: AllowAction:- 'sagemaker:DeleteTags'- 'sagemaker:ListTags'- 'sagemaker:DescribeNotebookInstance'- 'sagemaker:ListNotebookInstanceLifecycleConfigs'- 'sagemaker:DescribeModel'- 'sagemaker:ListTrainingJobs'- 'sagemaker:DescribeHyperParameterTuningJob'- 'sagemaker:UpdateEndpointWeightsAndCapacities'- 'sagemaker:ListHyperParameterTuningJobs'- 'sagemaker:ListEndpointConfigs'- 'sagemaker:DescribeNotebookInstanceLifecycleConfig'- 'sagemaker:ListTrainingJobsForHyperParameterTuningJob'- 'sagemaker:StopHyperParameterTuningJob'- 'sagemaker:DescribeEndpointConfig'- 'sagemaker:ListModels'- 'sagemaker:AddTags'- 'sagemaker:ListNotebookInstances'- 'sagemaker:StopTrainingJob'- 'sagemaker:ListEndpoints'- 'sagemaker:DeleteEndpoint'Resource:- !Join - ''- - 'arn:aws:sagemaker:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':notebook-instance-lifecycle-config/*'Condition:StringEquals:'aws:PrincipalTag/ProjectID': !Ref ProjectID- Effect: AllowAction:- 'ecr:SetRepositoryPolicy'- 'ecr:CompleteLayerUpload'- 'ecr:BatchDeleteImage'- 'ecr:UploadLayerPart'- 'ecr:DeleteRepositoryPolicy'- 'ecr:InitiateLayerUpload'- 'ecr:DeleteRepository'- 'ecr:PutImage'Resource: - !Join - ''- - 'arn:aws:ecr:'- !Ref 'AWS::Region'- ':'- !Ref 'AWS::AccountId'- ':repository/*sagemaker*'- Effect: AllowAction:- 's3:GetObject'- 's3:ListBucket'- 's3:PutObject'- 's3:DeleteObject'Resource:- !Join - ''- - 'arn:aws:s3:::'- !Ref SagemakerS3Bucket- !Join - ''- - 'arn:aws:s3:::'- !Ref SagemakerS3Bucket- /*Condition:StringEquals:'aws:PrincipalTag/ProjectID': !Ref ProjectID- Effect: AllowAction: 'iam:PassRole'Resource:- !Join - ''- - 'arn:aws:iam::'- !Ref 'AWS::AccountId'- ':role/*'Condition:StringEquals:'iam:PassedToService': sagemaker.amazonaws.comCodeBucketPolicy:Type: 'AWS::IAM::Policy'Condition: BucketConditionProperties:PolicyName: !Join - ''- - !Ref ProjectName- CodeBucketPolicyPolicyDocument:Version: 2012-10-17Statement:- Effect: AllowAction:- 's3:GetObject'Resource:- !Join - ''- - 'arn:aws:s3:::'- !Ref CodeBucketName- !Join - ''- - 'arn:aws:s3:::'- !Ref CodeBucketName- '/*'Roles:- !Ref SageMakerExecutionRoleSagemakerS3Bucket:Type: 'AWS::S3::Bucket'Properties:BucketEncryption:ServerSideEncryptionConfiguration:- ServerSideEncryptionByDefault:SSEAlgorithm: AES256Tags:- Key: ProjectIDValue: !Ref ProjectID- Key: ProjectNameValue: !Ref ProjectNameS3Policy:Type: 'AWS::S3::BucketPolicy'Properties:Bucket: !Ref SagemakerS3BucketPolicyDocument:Version: 2012-10-17Statement:- Sid: AllowAccessFromVPCEndpointEffect: AllowPrincipal: "*"Action:- 's3:Get*'- 's3:Put*'- 's3:List*'- 's3:DeleteObject'Resource:- !Join - ''- - 'arn:aws:s3:::'- !Ref SagemakerS3Bucket- !Join - ''- - 'arn:aws:s3:::'- !Ref SagemakerS3Bucket- '/*'Condition:StringEquals:"aws:sourceVpce": "<PASTE S3 VPC ENDPOINT ID>"EFSLifecycleConfig:Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig'Properties:NotebookInstanceLifecycleConfigName: 'Provisioned-LC'OnCreate:- Content: !Base64 'Fn::Join':- ''- - |#!/bin/bash - |aws configure set sts_regional_endpoints regional - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/configOnStart:- Content: !Base64 'Fn::Join':- ''- - |#!/bin/bash - |aws configure set sts_regional_endpoints regional - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config EFSLifecycleConfigForS3:Type: 'AWS::SageMaker::NotebookInstanceLifecycleConfig'Properties:NotebookInstanceLifecycleConfigName: 'Provisioned-LC-S3'OnCreate:- Content: !Base64 'Fn::Join':- ''- - |#!/bin/bash - |# Copy Content- !Sub >aws s3 cp s3://${CodeBucketName} /home/ec2-user/SageMaker/ --recursive - |# Set sts endpoint- >aws configure set sts_regional_endpoints regional - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/configOnStart:- Content: !Base64 'Fn::Join':- ''- - |#!/bin/bash - |aws configure set sts_regional_endpoints regional - yes | cp -rf ~/.aws/config /home/ec2-user/.aws/config SageMakerCustomResource:Type: 'Custom::SageMakerCustomResource'DependsOn: S3PolicyProperties:ServiceToken: !Ref SageMakerBuildFunctionARNNotebookInstanceName: !Ref NotebookInstanceNameNotebookInstanceType: !Ref NotebookInstanceTypeKmsKeyId: !Ref SagemakerKMSKeyENVName: !Join - ''- - !Ref ENVName- !Sub Subnet1IdSubnet: !Ref SubnetNameSecurityGroupName: !Ref SecurityGroupNameProjectName: !Ref ProjectNameRootAccess: !Ref RootAccessVolumeSizeInGB: !Ref VolumeSizeInGBLifecycleConfigName: !If [BucketCondition, !GetAtt EFSLifecycleConfigForS3.NotebookInstanceLifecycleConfigName, !GetAtt EFSLifecycleConfig.NotebookInstanceLifecycleConfigName] DirectInternetAccess: !Ref DirectInternetAccessRoleArn: !GetAtt - SageMakerExecutionRole- ArnTags:- Key: ProjectIDValue: !Ref ProjectID- Key: ProjectNameValue: !Ref ProjectName
Outputs:Message:Description: Execution StatusValue: !GetAtt - SageMakerCustomResource- MessageSagemakerKMSKey:Description: KMS Key for encrypting Sagemaker resourceValue: !Ref KeyAliasExecutionRoleArn:Description: ARN of the Sagemaker Execution RoleValue: !Ref SageMakerExecutionRoleS3BucketName:Description: S3 bucket for SageMaker Notebook operationValue: !Ref SagemakerS3BucketNotebookInstanceName:Description: Name of the Sagemaker Notebook instance createdValue: !Ref NotebookInstanceNameProjectName:Description: Project ID used for SageMaker deploymentValue: !Ref ProjectNameProjectID:Description: Project ID used for SageMaker deploymentValue: !Ref ProjectID
3. 接下来我们进入VPC服务主页,进入Endpoint功能,点击Create endpoint创建一个VPC endpoint节点,用于SageMaker私密安全的访问S3桶中的大模型文件。
4. 为节点命名为“s3-endpoint”,并选择节点访问对象类型为AWS service,选择s3作为访问服务。
5. 选择节点所在的VPC,并配置路由表,最后点击创建。
6. 接下来我们进入亚马逊云科技service catalog服务主页,进入Portfolio功能,点击create创建一个新的portfolio,用于统一管理一整个包括不同云资源的服务。
7. 为service portfolio起名“SageMakerPortfolio“,所有者选为CQ。
8. 接下来我们为Portfolio添加云资源,点击"create product"
9. 我们选择通过CloudFormation IaC脚本的形式创建Product云资源,为Product其名为”SageMakerProduct“,所有者设置为CQ。
10. 在Product中添加CloudFormation脚本文件,我们通过URL的形式,将我们在第二步上传到S3中的CloudFormation脚本URL填入,并设置版本为1,最后点击Create创建Product云资源。
11.接下来我们进入到Constraints页面,点击create创建Constraints,用于通过权限管理限制利用Service Catalog Product对云资源的操作。
12. 选择限制我们刚刚创建的的Product: "SageMakerProduct",选择限制的类型为创建。
13. 为限制添加IAM角色规则,IAM角色中配置了对Product权限管理规则,再点击Create创建。
14. 接下来我们点击Access,创建一个Access来限制可以访问Product云资源的用户。
15. 我们添加了角色”SCEndUserRole“,用户代替用户访问Product创建云资源。
16. 接下来我们开始利用Service Catalog Product创建一些列的云资源。选中我们刚创建的Product,点击Launch
17. 为我们要创建的云资源Product起一个名字”DataScientistProduct“, 选择我们前一步创建的版本号1。
18. 为将要通过Product创建的SageMaker配置参数,环境名以及实例名
19. 添加我们在最开始创建的Lambda函数ARN ID,点击Launch开始创建。
20. 最后回到SageMaker服务主页,可以看到我们利用Service Catalog Product功能成功创建了一个新的Jupyter Notebook实例。利用这个实例,我们就可以开发我们的AI服务应用。
以上就是在亚马逊云科技上利用亚马逊云科技安全、合规地训练AI大模型和开发AI应用全部步骤。欢迎大家未来与我一起,未来获取更多国际前沿的生成式AI开发方案。