1.训练模型:建bucket,建job,提交运行。
BUCKET_NAME=gs://${USER}_yt8m_train_bucket_logisticmodel # (One Time) Create a storage bucket to store training logs and checkpoints. gsutil mb -l us-east1 $BUCKET_NAME # Submit the training job. JOB_NAME=yt8m_train_LogisticModel$(date +%Y%m%d_%H%M%S); gcloud --verbosity=debug ml-engine jobs \ submit training $JOB_NAME \ --package-path=youtube-8m --module-name=youtube-8m.train \ --staging-bucket=$BUCKET_NAME --region=us-east1 \ --config=youtube-8m/cloudml-gpu.yaml \ -- --train_data_pattern='gs://youtube8m-ml-us-east1/1/video_level/train/train*.tfrecord' \ --model=LogisticModel \ --train_dir=$BUCKET_NAME/yt8m_train_video_level_logistic_model BUCKET_NAME=gs://${USER}_yt8m_train_bucket_lstmmodel gsutil mb -l us-east1 $BUCKET_NAME JOB_NAME=yt8m_train_LstmModel$(date +%Y%m%d_%H%M%S); gcloud --verbosity=debug ml-engine jobs \ submit training $JOB_NAME \ --package-path=youtube-8m --module-name=youtube-8m.train \ --staging-bucket=$BUCKET_NAME --region=us-east1 \ --config=youtube-8m/cloudml-gpu.yaml \ -- --train_data_pattern='gs://youtube8m-ml-us-east1/1/frame_level/train/train*.tfrecord' \ --frame_features=True --model=LstmModel --feature_names="rgb" \ --feature_sizes="1024" --batch_size=128 \ --train_dir=$BUCKET_NAME/yt8m_train_frame_level_lstmModel BUCKET_NAME=gs://${USER}_yt8m_train_bucket_framelevellogisticmodel gsutil mb -l us-east1 $BUCKET_NAME JOB_NAME=yt8m_train_FrameLevelLogisticModel$(date +%Y%m%d_%H%M%S); gcloud --verbosity=debug ml-engine jobs \ submit training $JOB_NAME \ --package-path=youtube-8m --module-name=youtube-8m.train \ --staging-bucket=$BUCKET_NAME --region=us-east1 \ --config=youtube-8m/cloudml-gpu.yaml \ -- --train_data_pattern='gs://youtube8m-ml-us-east1/1/frame_level/train/train*.tfrecord' \ --frame_features=True --model=FrameLevelLogisticModel --feature_names="rgb" \ --feature_sizes="1024" --batch_size=128 \ --train_dir=$BUCKET_NAME/yt8m_train_video_framelevel_logisticmodel BUCKET_NAME=gs://${USER}_yt8m_train_bucket_dbofmodel gsutil mb -l us-east1 $BUCKET_NAME JOB_NAME=yt8m_train_DbofModel$(date +%Y%m%d_%H%M%S); gcloud --verbosity=debug ml-engine jobs \ submit training $JOB_NAME \ --package-path=youtube-8m --module-name=youtube-8m.train \ --staging-bucket=$BUCKET_NAME --region=us-east1 \ --config=youtube-8m/cloudml-gpu.yaml \ -- --train_data_pattern='gs://youtube8m-ml-us-east1/1/frame_level/train/train*.tfrecord' \ --frame_features=True --model=DbofModel --feature_names="rgb" \ --feature_sizes="1024" --batch_size=128 \ --train_dir=$BUCKET_NAME/yt8m_train_frame_level_dbofmodel
2.查看log,训练过程
点击侧边栏的logging可以查看程序输出。
tensorboard:https://cloud.google.com/ml-engine/docs/how-tos/getting-started-training-prediction#tensorboard-local
OUTPUT=$BUCKET_NAME/yt8m_train_video_framelevel_logisticmodel (就是填入train_dir的内容)
python -m tensorflow.tensorboard --logdir=$OUTPUT --port=8080
Select "Preview on port 8080" from the Web Preview menu at the top of the command-line.
3.在测试集上进行测试:
JOB_TO_EVAL=yt8m_train_video_level_logistic_model JOB_NAME=yt8m_inference_$(date +%Y%m%d_%H%M%S); gcloud --verbosity=debug ml-engine jobs \ submit training $JOB_NAME \ --package-path=youtube-8m --module-name=youtube-8m.inference \ --staging-bucket=$BUCKET_NAME --region=us-east1 \ --config=youtube-8m/cloudml-gpu.yaml \ -- --input_data_pattern='gs://youtube8m-ml/1/video_level/test/test*.tfrecord' \ --train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \ --output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions.csv JOB_NAME=yt8m_dbofmodel_inference_$(date +%Y%m%d_%H%M%S); gcloud --verbosity=debug ml-engine jobs \ submit training $JOB_NAME \ --package-path=youtube-8m --module-name=youtube-8m.inference \ --staging-bucket=$BUCKET_NAME --region=us-east1 \ --config=youtube-8m/cloudml-gpu.yaml \ -- --input_data_pattern='gs://youtube8m-ml-us-east1/1/frame_level/test/test*.tfrecord' \ --frame_features=True --model=FrameLevelLogisticModel --feature_names="rgb" \ --feature_sizes="1024" --batch_size=128 \ --train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \ --output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions.csv JOB_NAME=yt8m_framelevellogistic_inference_$(date +%Y%m%d_%H%M%S); gcloud --verbosity=debug ml-engine jobs \ submit training $JOB_NAME \ --package-path=youtube-8m --module-name=youtube-8m.inference \ --staging-bucket=$BUCKET_NAME --region=us-east1 \ --config=youtube-8m/cloudml-gpu.yaml \ -- --input_data_pattern='gs://youtube8m-ml-us-east1/1/frame_level/test/test*.tfrecord' \ --frame_features=True --model=FrameLevelLogisticModel --feature_names="rgb" \ --feature_sizes="1024" --batch_size=128 \ --train_dir=$BUCKET_NAME/${JOB_TO_EVAL} \ --output_file=$BUCKET_NAME/${JOB_TO_EVAL}/predictions.csv