merlin.batch package¶
Submodules¶
merlin.batch.big_query_util module¶
- 
merlin.batch.big_query_util.valid_column(column_name: str) → bool[source]¶
- Validate BigQuery column name - Parameters: - column_name – BigQuery column name - Returns: - boolean - Rules based on this page https://cloud.google.com/bigquery/docs/schemas#column_names * A column name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_) * It must start with a letter or underscore * Maximum length 128 
- 
merlin.batch.big_query_util.valid_columns(columns) → bool[source]¶
- Validate multiple BiqQuery columns - Parameters: - columns – List of columns - Returns: - boolean 
- 
merlin.batch.big_query_util.valid_dataset(dataset: str) → bool[source]¶
- Validate BigQuery dataset name - Parameters: - dataset – BigQuery dataset name - Returns: - boolean - Rules based on this page https://cloud.google.com/bigquery/docs/datasets#dataset-naming * May contain up to 1,024 characters * Can contain letters (upper or lower case), numbers, and underscores 
- 
merlin.batch.big_query_util.valid_table_id(table_id: str) → bool[source]¶
- Validate BigQuery source_table which satisfied this format project_id.dataset.table - Parameters: - table_id – Source table - Returns: - boolean 
- 
merlin.batch.big_query_util.valid_table_name(table_name: str) → bool[source]¶
- Validate BigQuery table name - Parameters: - table_name – BigQuery table name - Returns: - boolean - Rules based on this page https://cloud.google.com/bigquery/docs/tables#table_naming * A table name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_) * Maximum length 1024 
- 
merlin.batch.big_query_util.validate_text(text: str, pattern: str, max_length: int) → bool[source]¶
- Validate text based on regex pattern and maximum length allowed - Parameters: - text – Text to validate
- pattern – Regular expression pattern to validate text
- max_length – Maximum length allowed
 - Returns: - boolean 
merlin.batch.config module¶
- 
class merlin.batch.config.PredictionJobConfig(source: merlin.batch.source.Source, sink: merlin.batch.sink.Sink, service_account_name: str, result_type: merlin.batch.config.ResultType = <ResultType.DOUBLE: 'DOUBLE'>, item_type: merlin.batch.config.ResultType = <ResultType.DOUBLE: 'DOUBLE'>, resource_request: merlin.batch.config.PredictionJobResourceRequest = None, env_vars: Dict[str, str] = None)[source]¶
- Bases: - object- 
__init__(source: merlin.batch.source.Source, sink: merlin.batch.sink.Sink, service_account_name: str, result_type: merlin.batch.config.ResultType = <ResultType.DOUBLE: 'DOUBLE'>, item_type: merlin.batch.config.ResultType = <ResultType.DOUBLE: 'DOUBLE'>, resource_request: merlin.batch.config.PredictionJobResourceRequest = None, env_vars: Dict[str, str] = None)[source]¶
- Create configuration for starting a prediction job - Parameters: - source – source configuration. See merlin.batch.source package.
- sink – sink configuration. See merlin.batch.sink package
- service_account_name – secret name containing the service account for executing the prediction job.
- result_type – type of the prediction result (default to ResultType.DOUBLE).
- item_type – item type of the prediction result if the result_type is ResultType.ARRAY. Otherwise will be ignored.
- resource_request – optional resource request for starting the prediction job. If not given the system default will be used.
- env_vars – optional environment variables in the form of a key value pair in a list.
 
 - 
env_vars¶
 - 
item_type¶
 - 
resource_request¶
 - 
result_type¶
 - 
service_account_name¶
 - 
sink¶
 - 
source¶
 
- 
- 
class merlin.batch.config.PredictionJobResourceRequest(driver_cpu_request: str, driver_memory_request: str, executor_cpu_request: str, executor_memory_request: str, executor_replica: int)[source]¶
- Bases: - object- Resource request configuration for starting prediction job - 
__init__(driver_cpu_request: str, driver_memory_request: str, executor_cpu_request: str, executor_memory_request: str, executor_replica: int)[source]¶
- Create resource request object - Parameters: - driver_cpu_request – driver’s cpu request in kubernetes request format (e.g. : 500m, 1, 2, etc)
- driver_memory_request – driver’s memory request in kubernetes format (e.g.: 512Mi, 1Gi, 2Gi, etc)
- executor_cpu_request – executors’s cpu request in kubernetes request format (e.g. : 500m, 1, 2, etc)
- executor_memory_request – executors’s memory request in kubernetes format (e.g.: 512Mi, 1Gi, 2Gi, etc)
- executor_replica – number of executor to be used
 
 
- 
merlin.batch.job module¶
- 
class merlin.batch.job.JobStatus[source]¶
- Bases: - enum.Enum- An enumeration. - 
COMPLETED= 'completed'¶
 - 
FAILED= 'failed'¶
 - 
FAILED_SUBMISSION= 'failed_submission'¶
 - 
PENDING= 'pending'¶
 - 
RUNNING= 'running'¶
 - 
TERMINATED= 'terminated'¶
 - 
TERMINATING= 'terminating'¶
 
- 
- 
class merlin.batch.job.PredictionJob(job: client.models.prediction_job.PredictionJob, api_client: client.api_client.ApiClient)[source]¶
- Bases: - object- 
error¶
- Error message containing the reason of failed job - Returns: - str 
 - 
id¶
- ID of prediction job - Returns: - int 
 - 
name¶
- Prediction job name - Returns: - str 
 - 
status¶
- Prediction job status - Returns: - JobStatus 
 
- 
merlin.batch.sink module¶
- 
class merlin.batch.sink.BigQuerySink(table: str, staging_bucket: str, result_column: str, save_mode: merlin.batch.sink.SaveMode = <SaveMode.ERRORIFEXISTS: 0>, options: MutableMapping[str, str] = None)[source]¶
- Bases: - merlin.batch.sink.Sink- Sink contract for BigQuery to create prediction job - 
__init__(table: str, staging_bucket: str, result_column: str, save_mode: merlin.batch.sink.SaveMode = <SaveMode.ERRORIFEXISTS: 0>, options: MutableMapping[str, str] = None)[source]¶
- Parameters: - table – table id of destination BQ table in format gcp-project.dataset.table_name
- staging_bucket – temporary GCS bucket for staging write into BQ table
- result_column – column name that will be used to store prediction result.
- save_mode – save mode. Default to SaveMode.ERRORIFEXISTS. Which will fail if destination table already exists
- options – additional sink option to configure the prediction job.
 
 - 
options¶
 - 
result_column¶
 - 
save_mode¶
 - 
staging_bucket¶
 - 
table¶
 
- 
merlin.batch.source module¶
- 
class merlin.batch.source.BigQuerySource(table: str, features: Iterable[str], options: MutableMapping[str, str] = None)[source]¶
- Bases: - merlin.batch.source.Source- Source contract for BigQuery to create prediction job - 
__init__(table: str, features: Iterable[str], options: MutableMapping[str, str] = None)[source]¶
- Parameters: - table – table id if the source in format of gcp-project.dataset.table_name
- features – list of features to be used for prediction, it has to match the column name in the source table.
- options – additional option to configure source.
 
 - 
features¶
 - 
options¶
 - 
table¶
 
-