prompting.validators.reward#
Submodules#
- prompting.validators.reward.blacklist
- prompting.validators.reward.config
- prompting.validators.reward.dahoas
- prompting.validators.reward.diversity
- prompting.validators.reward.dpo
- prompting.validators.reward.nsfw
- prompting.validators.reward.open_assistant
- prompting.validators.reward.prompt
- prompting.validators.reward.reciprocate
- prompting.validators.reward.relevance
- prompting.validators.reward.reward
Package Contents#
Classes#
| Create a collection of name/value pairs. | |
| Reward framework default configuration. | 
- class prompting.validators.reward.Blacklist(boundary=6, n_min=5, n_max=14, word_limit=2000, A=1.3, preprocess='[^(\\w|\\s)]', partial_ratio_boundary=95, half_life=20000, support=0.01, error=0.001, memory_lim=1000000, frequency_multiplier=100)#
- Bases: - prompting.validators.reward.reward.BaseRewardModel- Parameters:
 - add(texts)#
- Extract and add n-grams from a list of texts to counter - Parameters:
- texts (list) – batch of completion texts 
 
 - extract_ngrams(text)#
- Extract n-grams from text string 
 - _add_ngrams(ngrams)#
- Adds n-grams to counter, removing old n-grams periodically. Counting and pruning method based on Lossy counter. Reference: https://files.ifi.uzh.ch/dbtg/sdbs13/T01.3.pdf - Parameters:
- ngrams (List[tuple]) – List of n-gram tuples 
 
 - prune()#
- Prune the counter when the count is smaller then bucket index. 
 - reset()#
- Reset counters to initial values. 
 - calculate_significance()#
- Calculate significance of all n-grams in counter. By construction, n-grams with count 1 will have significance 0. - Returns:
- Dictionary of n-gram tuples and their significance scores 
- Return type:
 
 - get_significance()#
- Get significance scores, only recalculating if the counter has been updated. - Returns:
- Dictionary of n-gram tuples and their significance scores 
- Return type:
 
 - most_common(n=10)#
- Get most common n-grams in queue 
 - most_significant(n=10, force_update=True)#
- Get most significant n-grams in queue based on significance scores 
 - set_counter_to_half()#
- Set all the counters to half for a rolling window effect. 
 - reward(prompt, completion, name)#
- Reward function for blacklist reward model. Returns 1 if completion contains an n-gram with significance above the boundary, 0 otherwise. 
 - get_rewards(prompt, completions, name)#
- Parameters:
- Return type:
- List[BlacklistRewardEvent] 
 
 - normalize_rewards(rewards)#
- This method normalizes the given rewards by updating the moving mean and variance statistics. The rewards are first standardized, and then scaled to the 0-1 range using a cumulative distribution function (CDF) to ensure they’re in a comparable range across different environments. - Args: rewards (torch.FloatTensor): The reward values to be normalized. - Returns: torch.FloatTensor: The normalized reward values. - Note: - This function uses Welford’s online algorithm to update the mean and variance. - It standardizes the reward values using the updated mean and variance. - It then scales the standardized values to the 0-1 range using the error function (erf) as a CDF. - Parameters:
- rewards (torch.FloatTensor) – 
- Return type:
- torch.FloatTensor 
 
 
- class prompting.validators.reward.NSFWRewardModel(device)#
- Bases: - prompting.validators.reward.reward.BaseRewardModel- Parameters:
- device (str) – 
 - nsfw_filter_model_path = 'facebook/roberta-hate-speech-dynabench-r4-target'#
 - reward(prompt, completion, name)#
- Parameters:
- Return type:
 
 - get_rewards(prompt, completions, name)#
- Parameters:
- Return type:
- List[NSFWRewardEvent] 
 
 - normalize_rewards(rewards)#
- Parameters:
- rewards (torch.FloatTensor) – 
- Return type:
- torch.FloatTensor 
 
 
- class prompting.validators.reward.DirectPreferenceRewardModel(device)#
- Bases: - prompting.validators.reward.reward.BaseRewardModel- Parameters:
- device (str) – 
 - reward_single(prompt, completion, name, with_penalty=True)#
- Calculates a direct preference optimization (DPO) style reward for a completion, which is a reference model’s average log-probability for completion tokens given a prompt. Uses guidance from eric-mitchell/direct-preference-optimization. - Parameters:
- Return type:
 
 - get_rewards(prompt, completions, name)#
- Parameters:
- Return type:
 
 
- class prompting.validators.reward.OpenAssistantRewardModel(device)#
- Bases: - prompting.validators.reward.reward.BaseRewardModel- Parameters:
- device (str) – 
 - reward_single(prompt, completion, name)#
- Parameters:
- Return type:
 
 - get_rewards(prompt, completions, name)#
- Parameters:
- Return type:
 
 
- class prompting.validators.reward.ReciprocateRewardModel(device)#
- Bases: - prompting.validators.reward.reward.BaseRewardModel- Parameters:
- device (str) – 
 - reward(prompt, completion, name)#
- Parameters:
- Return type:
 
 - get_rewards(prompt, completions, name)#
- Parameters:
- Return type:
 
 
- class prompting.validators.reward.RelevanceRewardModel(device)#
- Bases: - prompting.validators.reward.reward.BaseRewardModel- Parameters:
- device (str) – 
 - get_rewards(prompt, completions, name)#
- Parameters:
- Return type:
- List[RelevanceRewardEvent] 
 
 - normalize_rewards(rewards)#
- This method normalizes the given rewards by updating the moving mean and variance statistics. The rewards are first standardized, and then scaled to the 0-1 range using a cumulative distribution function (CDF) to ensure they’re in a comparable range across different environments. - Args: rewards (torch.FloatTensor): The reward values to be normalized. - Returns: torch.FloatTensor: The normalized reward values. - Note: - This function uses Welford’s online algorithm to update the mean and variance. - It standardizes the reward values using the updated mean and variance. - It then scales the standardized values to the 0-1 range using the error function (erf) as a CDF. - Parameters:
- rewards (torch.FloatTensor) – 
- Return type:
- torch.FloatTensor 
 
 - reward(prompt, completion, name)#
- Parameters:
- Return type:
 
 
- class prompting.validators.reward.BaseRewardModel#
- 
- abstract get_rewards(prompt, completion, name)#
 - normalize_rewards(rewards)#
- This method normalizes the given rewards by updating the moving mean and variance statistics. The rewards are first standardized, and then scaled to the 0-1 range using a cumulative distribution function (CDF) to ensure they’re in a comparable range across different environments. - Args: rewards (torch.FloatTensor): The reward values to be normalized. - Returns: torch.FloatTensor: The normalized reward values. - Note: - This function uses Welford’s online algorithm to update the mean and variance. - It standardizes the reward values using the updated mean and variance. - It then scales the standardized values to the 0-1 range using the error function (erf) as a CDF. - Parameters:
- rewards (torch.FloatTensor) – 
- Return type:
- torch.FloatTensor 
 
 - apply(prompt, responses, name)#
- Applies the reward model across each call. Unsuccessful responses are zeroed. - Parameters:
- prompt (str) – 
- responses (List[bittensor.Synapse]) – 
- name (str) – 
 
- Return type:
- Union[torch.FloatTensor, dict] 
 
 
- class prompting.validators.reward.DahoasRewardModel(path, device)#
- Bases: - prompting.validators.reward.reward.BaseRewardModel- model_name = 'EleutherAI/gpt-j-6b'#
 - reward(prompt, completion, name)#
- Parameters:
- Return type:
 
 - get_rewards(prompt, completions, name)#
- Parameters:
- Return type:
 
 - forward(input_ids=None, past_key_values=None, attention_mask=None, token_type_ids=None, position_ids=None, head_mask=None, inputs_embeds=None, mc_token_ids=None, labels=None, return_dict=False, output_attentions=False, output_hidden_states=False)#
 
- class prompting.validators.reward.DiversityRewardModel(device)#
- Bases: - prompting.validators.reward.reward.BaseRewardModel- Parameters:
- device (str) – 
 - diversity_model_path = 'sentence-transformers/all-mpnet-base-v2'#
 - get_embeddings(sentences)#
- Runs a forward pass through the model. :param sentences: text message to be encoded. :type sentences: - List[str]- Returns:
- Embedding for the message. 
- Return type:
- embedding ( - torch.FloatTensor)
- Parameters:
- sentences (List[str]) – 
 
 - update_historic_embeddings(embeddings)#
- Parameters:
- embeddings (torch.FloatTensor) – 
 
 - get_historic_rewards(embeddings)#
- Parameters:
- embeddings (torch.FloatTensor) – 
- Return type:
- torch.FloatTensor 
 
 - get_batch_rewards(embeddings)#
- Parameters:
- embeddings (torch.FloatTensor) – 
- Return type:
- torch.FloatTensor 
 
 - get_rewards(prompt, completions, name)#
- Parameters:
- Return type:
- List[DiversityRewardEvent] 
 
 - normalize_rewards(raw_rewards)#
- This method normalizes the given rewards by updating the moving mean and variance statistics. The rewards are first standardized, and then scaled to the 0-1 range using a cumulative distribution function (CDF) to ensure they’re in a comparable range across different environments. - Args: rewards (torch.FloatTensor): The reward values to be normalized. - Returns: torch.FloatTensor: The normalized reward values. - Note: - This function uses Welford’s online algorithm to update the mean and variance. - It standardizes the reward values using the updated mean and variance. - It then scales the standardized values to the 0-1 range using the error function (erf) as a CDF. - Parameters:
- raw_rewards (torch.FloatTensor) – 
- Return type:
- torch.FloatTensor 
 
 
- class prompting.validators.reward.PromptRewardModel(device)#
- Bases: - prompting.validators.reward.reward.BaseRewardModel- Parameters:
- device (str) – 
 - reward(prompt, completion, name)#
- Parameters:
- Return type:
 
 - get_rewards(prompt, completions, name)#
- Parameters:
- Return type:
 
 
- class prompting.validators.reward.RewardModelType(*args, **kwds)#
- Bases: - enum.Enum- Create a collection of name/value pairs. - Example enumeration: - >>> class Color(Enum): ... RED = 1 ... BLUE = 2 ... GREEN = 3 - Access them by: - attribute access: - >>> Color.RED <Color.RED: 1> 
- value lookup: - >>> Color(1) <Color.RED: 1> 
- name lookup: - >>> Color['RED'] <Color.RED: 1> 
 - Enumerations can be iterated over, and know how many members they have: - >>> len(Color) 3 - >>> list(Color) [<Color.RED: 1>, <Color.BLUE: 2>, <Color.GREEN: 3>] - Methods can be added to enumerations, and members can have their own attributes – see the documentation for details. - dpo = 'dpo_reward_model'#
 - rlhf = 'rlhf_reward_model'#
 - reciprocate = 'reciprocate_reward_model'#
 - dahoas = 'dahoas_reward_model'#
 - diversity = 'diversity_reward_model'#
 - prompt = 'prompt_reward_model'#
 - blacklist = 'blacklist_filter'#
 - nsfw = 'nsfw_filter'#
 - relevance = 'relevance_filter'#
 - relevance_bert = 'relevance_bert'#
 - relevance_mpnet = 'relevance_mpnet'#
 - task_validator = 'task_validator_filter'#
 - keyword_match = 'keyword_match_penalty'#
 
- class prompting.validators.reward.DefaultRewardFrameworkConfig#
- Reward framework default configuration. Note: All the weights should add up to 1.0.