EricMulhernTinyfish

@EricMulhernTinyfish

GitHub Profile
diplomatic but thorough
Eric is a thorough, thoughtful reviewer who focuses heavily on architectural concerns and long-term maintainability. He asks probing questions about design decisions and scope, often suggesting alternative approaches or raising concerns about edge cases that others might miss.
471
Comments
199
PRs
8
Repos
139
Avg Chars
3
Harshness

Personality

Architecturally-minded Questioning and curious Detail-oriented Forward-thinking Pragmatic about business impact Constructive and collaborative Type-safety conscious Scope-aware

Greatest Hits

"Not sure I understand the scope/purpose of this ticket. Could you please explain"
"maybe out of scope for this ticket, but"
"I'm a little wary of"
"rather than hard-coding, parameterize this"
"let's avoid using `any` whenever possible"
"Good callout"
"I'm curious where this comes from and what does it signify?"

Focus Areas

Common Phrases

"I'm curious" "Not sure I understand" "Could you please explain" "maybe out of scope for this ticket" "I'm a little wary of" "I'm concerned about" "looks good" "this seems" "rather than hard-coding" "let's make sure" "I think" "we should" "can we" "please" "good callout"

Sentiment Breakdown

neutral
205
harsh_questioning
4
constructive
58
questioning
21
positive
35
very_positive
3
harsh
1
critical
1

Review Outcomes

APPROVED
153
CHANGES_REQUESTED
24
COMMENTED
3

Most Reviewed Authors

EricMulhernTinyfish
126
frankfeng98
42
cyrusagent
36
colriot
33
simantak-dabhade
30
SpencerMurphy
28
thakkerurvish
25
KateZhang98
22
gauravreddy08
22
paveldudka
20

Spiciest Comments

friday/#1369 [view]
Not sure I understand the scope/purpose of this ticket. Could you please explain what the new jsonl format changes are?
friday/#1322 [view]
Lots of great work here – this is a decent solution for first round of evaluating weblog. However, I have some doubts about the long-term accuracy of a visual comparison approach. There may be things that result in subtle changes in the image but are breaking, for example if a button is currently clickable or not. Or, there may be major visual differences that are unimportant for successful replay, such as if an image loads correctly or if the content of a canvas element displays or not. A mor
ux-labs/#1651 [view]
> I think overall this looks good. Some concerns that may need to be followed up on: > > * I think we should continue to pay very close attention to the possibility of credit balance shift, since our DB stores grants but I don't think consumption. Please double check me here > * whether we want fallbacks on DB writes/reads or just to throw errors as we see here > * as with 1 if we need a decrement function > * `markSubscriptionCancelled` cache staleness between TTL refresh potential > >
ux-labs/#1753 · frontend/app/lib/services/run-event-handler.ts [view]
Yeah – no way to know how many steps the run will be, so we should trigger the run even if just 1 credit left
unikraft-cdp/#244 [view]
> I'd love to check with @EricMulhernTinyfish on postData plans and status. Not sure if you saw Pasha's comment about broken replay. > > Regarding `maskInputOptions: { password: true,`, Uttam actually mentioned this in the PR summary — it doesn't mask all PII, like usernames (only secrets) @colriot Oh good callout, I'll take a more thorough look. > I have concerns here. Maybe right now we don't use post body for mapping, but we actually should. Imagine the case when the same request is
unikraft-cdp/#70 · cdp-proxy/src/services/s3-service.ts [view]
in typescript we have no way to verify the object type of the error – it's a little silly but manual error type validation is required

AI Persona Prompt

You are Eric, a senior engineer who reviews code with a focus on architecture, maintainability, and long-term thinking. You're diplomatic but thorough, often asking probing questions to understand the broader context and scope of changes. You frequently start comments with phrases like 'I'm curious', 'Not sure I understand', or 'Could you please explain'. You're particularly concerned with avoiding hard-coded values, maintaining type safety (you dislike 'any' types), and ensuring changes don't have unintended consequences. You often consider business impact and whether proposed solutions will scale. When you see potential issues, you phrase concerns diplomatically using 'I'm a little wary of' or 'I'm concerned about'. You regularly suggest alternative approaches and aren't afraid to question fundamental design decisions. You pay attention to scope creep and will ask 'maybe out of scope for this ticket, but' when you spot related issues. You appreciate good work ('Lots of great work here') but always follow up with constructive concerns. You're collaborative, often suggesting 'let's make sure' or 'can we' rather than demanding changes. You balance thoroughness with pragmatism, considering both technical excellence and business needs. Your reviews are detailed and educational, helping team members understand not just what to change, but why.

Recent Comments (328 total)

friday/#1380 Revise WebLog evaluator pixel comparison mapping logic · pyt/evaluation/torchx/weblog_evaluation_runner.py [view]
rather than hard-coding, parameterize this so it can be configured when running the weblog evaluator
friday/#1369 Update weblog evaluator to support new jsonl format [view]
Not sure I understand the scope/purpose of this ticket. Could you please explain what the new jsonl format changes are?
friday/#1360 Enable playwright trace · pyt/evaluation/torchx/execution_runner.py [view]
maybe out of scope for this ticket, but if we change this here then do we have to also change it in the script which splices out those navigation logs? (is that script still manual or did we integrate it into the recording flow?)
friday/#1346 Metrics for weblog evaluation runner [view]
Good to merge pending README update
friday/#1322 Add weblog evaluation script [view]
Lots of great work here – this is a decent solution for first round of evaluating weblog. However, I have some doubts about the long-term accuracy of a visual comparison approach. There may be things that result in subtle changes in the image but are breaking, for example if a button is currently clickable or not. Or, there may be major visual differences that are unimportant for successful replay
friday/#1322 Add weblog evaluation script · pyt/evaluation/torchx/utils.py [view]
Why was this type change necessary? This seems a bit fishy to say that evaluation_result could either be EvaluationResult or bool. Maybe we should be creating a separate parameter for is_same_image?
friday/#1322 Add weblog evaluation script · pyt/evaluation/torchx/weblog_evaluation_runner.py [view]
Do we have a sense visually of how similar two images have to be in order to look 95% similar? Curious if this figure is arbitrary or if there's a reason behind it
friday/#1322 Add weblog evaluation script · pyt/evaluation/torchx/weblog_evaluation_runner.py [view]
can we add a data type for 'example' with allowed props specified?
friday/#1322 Add weblog evaluation script · pyt/evaluation/torchx/weblog_evaluation_runner.py [view]
I'm curious where this state_id comes from and what does it signify?
friday/#1322 Add weblog evaluation script · pyt/evaluation/torchx/weblog_evaluation_runner.py [view]
Can you clarify this?
friday/#1322 Add weblog evaluation script · pyt/evaluation/torchx/weblog_evaluation_runner.py [view]
```suggestion Both images must be the same dimensions. Caller is responsible for resizing beforehand. ``` Are we also making sure that the screens are the same size when each screenshot is taken? Replay should match the screen size of the original recorded session.
friday/#1322 Add weblog evaluation script · pyt/evaluation/torchx/weblog_evaluation_runner.py [view]
I'm a little wary of supporting comparing screenshots of different dimensions, since that means the dimensions at which was recorded will have an impact on whether we consider it a match or not – this could affect our evaluation results. Better to make sure the screenshots are taken with the same screen size in the first place
friday/#1322 Add weblog evaluation script · pyt/evaluation/torchx/weblog_evaluation_runner.py [view]
This is great that you're able to specify the timeout here... for now we'll just have to make sure that timeout >10min. Once I change the strategy for notifying when replay is complete, we'll have to make a change here as well and sync deploying the changes
friday/#1322 Add weblog evaluation script · pyt/evaluation/torchx/weblog_evaluation_runner.py [view]
this naming might be a bit misleading - if i'm understanding correctly, 'successful' only indicates whether or not the comparison ran correctly or errored out, not whether or not the comparison met the threshold. I would suggest renaming to something like evaluator_ran_successfully. Otherwise could be confused as saying 'compared image was a match'
friday/#1321 Update execution runner to use remote browser + enable Weblog v1 · pyt/evaluation/torchx/execution_runner.py [view]
Is there a reason we're keeping these fields we're not using anymore?