mingyang-tinyfish

Mingyang

@mingyang-tinyfish
GitHub Profile
collaborative and thoughtful
Mingyang is a thorough, collaborative reviewer who focuses on architectural decisions and system design. He provides detailed technical explanations, asks thoughtful questions about design choices, and often suggests alternative approaches while maintaining a constructive tone.
654
Comments
211
PRs
9
Repos
215
Avg Chars
3
Harshness

Personality

Architecturally-minded Collaborative and team-oriented Detail-oriented with explanations Questioning but constructive Future-focused on maintainability Appreciative of good work Pragmatic about trade-offs Proactive in creating tickets and follow-ups

Greatest Hits

"LGTM overall!"
"Thanks for the quick review!"
"Good point! Updated the code"
"I think we can move this to"
"Would it be better for"
"Created a ticket for it"
"Feel free to test it out!"

Focus Areas

Common Phrases

"I think" "Thanks for" "Good point!" "LGTM overall!" "Let me know if" "Would it be better" "This will be part of" "Feel free to" "I am afraid this will lead to" "As part of the" "Right now" "For now" "We might wanna" "Created a ticket for it" "Thanks for the quick review!"

Sentiment Breakdown

neutral
364
questioning
50
constructive
75
positive
59
harsh_questioning
14
critical
4
very_positive
6

Review Outcomes

APPROVED
93
COMMENTED
2
CHANGES_REQUESTED
33

Most Reviewed Authors

mingyang-tinyfish
259
jinyangTF
157
zifanwTF
61
wjwjtf
56
jayfish0
29
bellatinyfish
21
ayc1
20
paveldudka
18
thakkerurvish
10
shuhaodo
6

Spiciest Comments

goldfish/#106 · app/response/merger/merger.py [view]
Clarification: - Those new types are for experimenting and testing purpose. The strategy config is only for internal use and not configurable by the users. For production, it should always refer to the default strategy defined here- https://github.com/tinyfish-io/goldfish/blob/309aed6433920f39594f88a9d1c455b4e64fc549/app/config.py#L47 - We will update the default strategy over time when we find better config set after systematic evaluation with our WEB data. The normalization + randomn
goldfish/#158 · app/llm/model.py [view]
A temporary max-token and max-output-tokens. Feel free to update it when we have a solid number for these configs. @zifanwTF
goldfish/#158 · app/llm/aws_sagemaker.py [view]
@zifanwTF I was referring to the sample code here: https://huggingface.co/codellama/CodeLlama-7b-hf ``` from transformers import AutoTokenizer import transformers import torch model = "codellama/CodeLlama-7b-hf" tokenizer = AutoTokenizer.from_pretrained(model) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) sequences = pipeline( 'import socket\n\ndef ping_exponential_backoff(host: str):',
goldfish/#160 · tests/app/llm/model_family_test.py [view]
The encoded values by Mistral and Codellama don't have spaces in each token @zifanwTF Is this expected?
goldfish/#160 [view]
@zifanwTF I switched Tokenizer to AutoTokenizer. Would need your help to look into the following issues: 1. The claude tokenizer as suggested in https://huggingface.co/Xenova/claude-tokenizer cannot be found by AutoTokenizer. `Error msg: ValueError: Tokenizer class ClaudeTokenizer does not exist or is not currently imported. ` 2. The encoded values are different from my previous Tokenizer. Could you plz check if those encoded values make sense? Some seems to be weird. You can find the
goldfish/#160 [view]
> > @zifanwTF I switched Tokenizer to AutoTokenizer. > > Would need your help to look into the following issues: > > > > 1. The claude tokenizer as suggested in https://huggingface.co/Xenova/claude-tokenizer cannot be found by AutoTokenizer. > > `Error msg: ValueError: Tokenizer class ClaudeTokenizer does not exist or is not currently imported. ` > > 2. The encoded values are different from my previous Tokenizer. Could you plz check if those encoded values make sense? Some seems to be
goldfish/#48 · app/llm/open_ai.py [view]
Looks like the max_tokens for OpenAI is referring to the total context window. 4096 may not be applicable to every model. Some other models are referring it to max response tokens, so we would need to review this config individually. @zifanwTF
goldfish/#48 · app/llm/model.py [view]
Here we centralize the default configs for all supported models. I'll add more explicit description to max-tokens once we have a clear understanding of how each model uses it. @zifanwTF
goldfish/#133 · app/response/prompts/gpt3_5/generation/baseline.py [view]
The example messages are appended to this message. In the case, the type hint here is not the end of the user prompt. https://github.com/tinyfish-io/goldfish/blob/9723479c1207ea3357537dde869b260e49881725/app/llm/open_ai.py#L66 @zifanwTF do we need to rearrange it to make sure this line is at the end of the simulated conservation when example messages are given?
goldfish/#191 [view]
> Sorry for chiming in, but this PR looks similar to the Pydantic Model I built. If using the Pydantic Model, with the following response > > `AgentQLResponse(search_btn=None, search_box=None, parent=Parent(child1=109, child2=113), links=[], capcha=[Capcha(name=None, price=None, reviews=[])])` > > You can just do `agentlql_response.model_dump_json()` to get a json string (you can load it to a dict ofc). > > `{"search_btn":null,"search_box":null,"parent":{"child1":109,"child2":113},"link

AI Persona Prompt

You are mingyang-tinyfish, a collaborative and architecturally-focused code reviewer. Your reviews are thorough and thoughtful, often diving deep into design decisions and system architecture. You frequently ask clarifying questions about design choices and suggest alternative approaches when you see potential improvements. Key aspects of your review style: - Always explain the 'why' behind your suggestions with detailed technical context - Use phrases like 'I think', 'Would it be better', 'Thanks for', and 'Good point!' regularly - Often mention future implications and maintainability concerns - Create tickets for follow-up work and reference them in reviews - Appreciate good work with 'LGTM overall!' and 'Thanks for the quick review!' - When you spot issues, you explain them thoroughly rather than just pointing them out - You're collaborative, often asking 'Let me know if' and 'Feel free to' to encourage discussion - You think about the bigger picture and how changes fit into the overall system - You're pragmatic about trade-offs and experimental features, often noting 'For now' or 'Right now' when discussing temporary solutions Focus your reviews on: system design, code organization, configuration management, performance implications, and how changes affect other parts of the system. You're not harsh but you are thorough - you want to understand the reasoning behind decisions and ensure the code is maintainable long-term. Always be constructive and offer specific suggestions when you see areas for improvement.

Recent Comments (572 total)

agentql-client/#88 WebQL Response Calibration · test/webql/syntax/parser_test.py [view]
The empty `IdNode` currently serves as leaf nodes of a tree so as to centralize validations and calibrations of the id values, so the ListNode can be agnostic to TF ids and focused on list related actions. Yeah, I think a better approach would be adding a typed parameter to the ListNode so can be more explicit on the type of ListNode, for example, ListNode<IdNode> or ListNode<DictNode>.
agentql-client/#88 WebQL Response Calibration · test/webql/syntax/parser_test.py [view]
The empty `IdNode` currently serves as leaf nodes of a tree so as to centralize validations and calibrations of the id values, so the `ListNode` can be agnostic to TF ids and focused on list related actions. Yeah, I think a better approach would be adding a typed parameter to the ListNode so can be more explicit on the type of `ListNode`, for example, `ListNode<IdNode>` or `ListNode<DictNode>`
agentql-client/#88 WebQL Response Calibration · src/webql/syntax/tree.py [view]
Yes, the response validation part would be only used on the server side. However, the evaluation part is tied to each node and also the tree structure, how do we separate that from parser?
agentql-client/#88 WebQL Response Calibration [view]
Thanks @paveldudka for your comments in https://github.com/tinyfish-io/goldfish/pull/54. All previous comments have been resolved in this PR.
agentql-client/#88 WebQL Response Calibration [view]
> Should this be in webql server v.s. client? This will be part of the client SDK for now as the parser lives here. Once the all concerns are resolved and PR is merged, we will consider pulling the parser as a separate library as both the client and server sides depend on it.
agentql-client/#126 added installation and setup for inspector [view]
LGTM overall!
agentql-client/#126 added installation and setup for inspector · docs/docs/intro.md [view]
Are we going to replace the URL placeholder with an actual download link?
goldfish/#480 [Query Fixer for Query Gen] · app/query/fixer/model.py [view]
is it intended to repeat this line in the system message?
goldfish/#480 [Query Fixer for Query Gen] · app/generator/query/edges.py [view]
Why do we return to the generator when a query has been fixed? Previous the workflow is cyclic because the generator is in charge of fixing the query given the error message from the validation so: Generator -> Validator - If fail -> Generator to re-generate the the error message - If succeed -> END It doesn't seem to need to be cyclic again if you have a separate fixer, is it?
goldfish/#480 [Query Fixer for Query Gen] · app/query/fixer/model.py [view]
I don’t have a strong opinion on this, as long as it has been tested and works well. However, I’m a bit concerned that the fixer seems to have less context about why a query is invalid or the AgentQL fundamentals compared to the generator, as seen in https://github.com/tinyfish-io/goldfish/blob/main/app/query/prompts/copilot/generation.py. If this design choice is intended to reduce noise and
goldfish/#480 [Query Fixer for Query Gen] [view]
> thanks for working on this! unfortunately I don't think using LLM to fix a query is the right move - you can't guarantee that this response will be valid either. IMO the more I think about it, the more I think the solution to this is to re-implement the way query gen works. > > currently query gen asks llm to return a query string. I have seen in the past that this this can cause unexpected s
goldfish/#480 [Query Fixer for Query Gen] [view]
> > thanks for working on this! unfortunately I don't think using LLM to fix a query is the right move - you can't guarantee that this response will be valid either. IMO the more I think about it, the more I think the solution to this is to re-implement the way query gen works. > > currently query gen asks llm to return a query string. I have seen in the past that this this can cause unexpected s
goldfish/#483 [TF-3275] Use gemini for locator · app/config.py [view]
is splitting and merging still needed for gemini locator?
goldfish/#483 [TF-3275] Use gemini for locator [view]
the fallback logics looks good. Some changes are needed to log the actual used config.
goldfish/#483 [TF-3275] Use gemini for locator · app/common/fallback.py [view]
Let's use `log = logging.getLogger(__name__)` so the log will include the filename