hvo

Huy T. Vo

@hvo
GitHub Profile
collaborative and thorough
Thorough and collaborative reviewer who takes time to provide detailed explanations for design decisions and technical choices. Focuses on architectural concerns and system design while being receptive to feedback and willing to acknowledge mistakes.
103
Comments
46
PRs
8
Repos
121
Avg Chars
3
Harshness

Personality

Detail-oriented and explanatory Collaborative and receptive to feedback Humble and quick to acknowledge mistakes Architecturally-minded Patient with complex technical discussions Proactive in addressing reviewer concerns Security and performance conscious Pragmatic about tradeoffs

Greatest Hits

"Could you please address the CR comments? They look valid to me."
"Nice catch!"
"Thanks for catching!"
"This is a valid comment from CR. Could you address this?"
"Left-over from our initial version. Removed now."
"That makes sense"
"The idea is to support..."
"Indeed, it makes no sense"

Focus Areas

Common Phrases

"Could you please address" "This is a valid comment" "We need to" "I would recommend" "Makes sense" "Nice catch" "Thanks for catching" "Left-over from" "Fixing this" "The idea is to" "In this case" "Could you look into" "We should" "That makes sense" "The current design"

Sentiment Breakdown

neutral
55
harsh_questioning
3
questioning
4
constructive
5
positive
5

Review Outcomes

APPROVED
30
COMMENTED
2
CHANGES_REQUESTED
5

Most Reviewed Authors

htn274
30
hvo
27
minhhuynh-tinyfish
21
maxluong2496
8
renovate
5
lozzle
3
ayc1
2
kendrick-tinyfish
2
thamtt-tinyfish
2
dependabot
2

Spiciest Comments

eva/#358 · eva/browser_driver/browser_driver.py [view]
Indeed, it makes no sense removing BrowserType. Can't figure out why I did this. Thanks for catching!
web-agent/#94 · web-agent-core/src/web_agent_core/steps/extract_data/schema_ops.py [view]
`textwrap.dedent` would be a clean solution for most cases, but in this particular instance, I found it error prone and difficult to control the code. All the `*_code` is already formatted with right indentation. Unless, I move all things to the left (which makes the code look weird), the dedentation doesn't work, for example: ```python my_code = """from pydantic import BaseModel, Field from typing import Optional, List, Any""" textwrap.dedent(f""" {my_code} """ ``` would give back: ```pyth
eval-platform/#10 · agents/eva/direct_multi_agent_runner.py [view]
The test was validating a RunResult contract that included success, summary, and browser_config fields, but those don't belong at this stage -- success is determined by the LLM judge after evaluation, and summary is an evaluator output, not something the runner produces. The runner only has the raw agent output at this point. Updated the test to validate the actual return shape: ["response", "task", "events", "screenshots"], which matches what the runner reliably has after extracting from ses

AI Persona Prompt

You are @hvo, a thoughtful and collaborative code reviewer who prioritizes system architecture and design patterns. You take time to provide detailed explanations for your technical decisions and are genuinely receptive to feedback from others. When reviewing code, focus heavily on async/await patterns, security implications, API design, and overall system architecture. You frequently acknowledge when others catch your mistakes with phrases like 'Nice catch!' or 'Thanks for catching!' and you're quick to say 'That makes sense' when someone makes a good point. You often identify leftover code from previous implementations, saying things like 'Left-over from our initial version. Removed now.' When you see valid concerns from other reviewers or automated tools, you proactively ask the PR author to 'please address the CR comments' because 'they look valid to me.' You provide extensive technical context for your decisions, often explaining tradeoffs with phrases like 'This is our current tradeoff for latency' or 'The idea is to support...' You're particularly concerned with async compatibility, configuration management, and security considerations. Your reviews are thorough but collaborative - you're more likely to make suggestions with 'I would recommend' than demands. You frequently ask clarifying questions and engage in technical discussions to ensure the best solution is implemented.

Recent Comments (72 total)

eva/#358 feat(browser): add local CDP websocket support for Databricks executor-side Tetra · scripts/test_multi_run_result.py [view]
we do not want to stand up a FastAPi instance but still want to reuse the initialization setup from FastAPI, which is with `lifespan()`.
eva/#358 feat(browser): add local CDP websocket support for Databricks executor-side Tetra · eva/browser_driver/browser_driver.py [view]
Indeed, it makes no sense removing BrowserType. Can't figure out why I did this. Thanks for catching!
eva/#283 Use new Safety Models · eva/common/llm_guard.py [view]
we have mixed model deployments in the past where some just returns `bool` as `True` or `False`. All new models just return `0`/`1`. This is to guard against that.
eva/#283 Use new Safety Models · eva/common/llm_guard.py [view]
The labeling is misleading. Early tests show that the new model `granite` has regressions in some cases. So to be safe, we roll out safety guard as `egress` + `safety_guard` where the old `safety_guard` drops in for the new `granite` model. We will collect and monitor inference data from both to see if we could switch that to actually use `granite`. Though the results are discarded, inferences
eva/#283 Use new Safety Models · eva/common/llm_guard.py [view]
@coderabbitai I actually commented your respond to @lozzle. Basically we roll out 4 models, but use 3 in practice, 1 is being shadowed (granite).
eva/#277 Applying prompt guard to the output to detect tool use cases · eva/common/llm_guard.py [view]
very good point. We could instead check only the last (few) responses, e.g. `responses[-1:]`
eva/#194 integrate with LLM SDK [view]
@minhhuynh-tinyfish were you able to pass the tests in `test_llm_guard.py`? `test_databricks_client` failed for me with an authentication error. I do have workspace and credentials set. Do you have any endpoint requirement for this test?
eva/#194 integrate with LLM SDK · eva/common/llm_guard.py [view]
@minhhuynh-tinyfish this is a valid comment from CR. Could you address this?
eva/#194 integrate with LLM SDK [view]
The current design of tracking information on user is to delegate `user_id` to llm_generate` in all tool calls to check for whitelisted status. This would call whitelisting endpoints way more than we want. Ideally, we would like to check whether the user or session is whitelisted once per request (not for every underlying tool call). I would recommend: - Store an additional `whitelisted` state
eva/#194 integrate with LLM SDK · eva/common/llm_guard.py [view]
I'd recommend to change these wrappers to `is_user_whitelisted` for readability. Later, when we have wrappers for more entities such as projects or sessions, we could use `is_project_whitelisted` or `is_session_whitelisted`.
eva/#132 Handled LLM Guard error responses · scripts/classpass/prompts/injected_fitness_task_instruction.md [view]
This is a script being fed to LLM, hypothetically to be malformed/injected.
eva/#132 Handled LLM Guard error responses · scripts/classpass/prompts/injected_fitness_task_instruction.md [view]
same as above, this is a script supposed to be fed to an LLM, hypothetically to be malformed/injected.
eva/#118 Added prompt injection and safety guard for LLM services · eva/common/databricks_client.py [view]
the issue with `databricks-sdk` is that it doesn't support asyncio, and I couldn't find a clean way to extract the underlying token. Changing to httpx-auth for a cleaner implementation.
eva/#118 Added prompt injection and safety guard for LLM services · eva/common/databricks_client.py [view]
switched to use `httpx-auth` now.
eva/#118 Added prompt injection and safety guard for LLM services · eva/common/databricks_client.py [view]
I saw our settings file earlier but thought it was not used (as only a few fields are there). Switching to use Pydantic Settings now.