Gemini 1.5 PRO latest + CEDARScript-G edit format #1897
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The new CEDARScript edit format looks promising, as it allowed Gemini-1.5-Flash to surpass Sonnet 3.5.
Here we're not using architect mode, but you can kinda say that Gemini is acting as an architect, and the edit format itself (
CEDARScript
) is acting as the editor.Quick comparisons
Sonnet 3.5 +
diff
Gemini 1.5 PRO +
diff-fenced
(leaderboard site)Gemini 1.5 PRO +
diff-fenced
(my own tests)Gemini 1.5 PRO +
CEDARScript
Gemini 1.5 Flash +
CEDARScript
functional_Functional__conform_to_reference_input
diff-fenced
cedarscript-g
See line count comparisons for some refactoring benchmark tasks.

Analysis: CEDARScript vs. Common Edit Formats in AI-Assisted Code Refactoring
The introduction of
CEDARScript
as an edit format for AI-assisted code refactoring has demonstrated an important leap in performance, particularly when used with Gemini 1.5 PRO and Gemini 1.5 Flash. This analysis compares CEDARScript against traditional diff-based edit formats, revealing striking improvements across multiple metrics.Overall Performance:
CEDARScript has dramatically enhanced the performance of Gemini models in code refactoring tasks. When paired with Gemini 1.5 PRO, it achieved an impressive 77.5% pass rate and 86.5% well-formed cases, significantly outperforming both its own diff-fenced format results (49.4% pass rate, 7.9% well-formed cases) and the highly regarded Claude 3.5 Sonnet (64.0% pass rate, 76.4% well-formed cases).
Most remarkably, the cost-effective Gemini 1.5 Flash model, when using CEDARScript, not only matched but surpassed the performance of Claude 3.5 Sonnet. With a 76.4% pass rate and an outstanding 94.4% well-formed cases, Gemini 1.5 Flash demonstrates that even a more affordable model can outperform top-tier competitors when equipped with the right tools. This breakthrough suggests that CEDARScript can level the playing field, enabling more accessible AI models to compete with and even exceed the capabilities of more expensive options in complex coding tasks.
Code Quality and Accuracy:
These improvements suggest that CEDARScript enables AI models to produce more accurate, syntactically correct, and well-structured code modifications.
Efficiency and Resource Utilization:
Examining the "functional_Functional__conform_to_reference_input" test case:
On a larger scale, CEDARScript with Gemini 1.5 PRO reduced the average time per case from 110.1 seconds to 29.0 seconds, a 73.7% improvement. Gemini 1.5 Flash further reduced this to 14.7 seconds, an 86.6% improvement over the original diff-fenced format.
Robustness and Reliability:
While the number of error outputs increased with CEDARScript, the number of malformed responses decreased significantly:
This suggests that while CEDARScript may generate more error outputs, it produces fewer malformed responses, potentially indicating more precise error handling and feedback.
Scalability and Cost-Effectiveness:
CEDARScript demonstrated impressive cost savings:
This cost reduction, combined with faster processing times, indicates excellent scalability for larger, more complex refactoring tasks.
Model Comparison:
Gemini 1.5 Flash with CEDARScript showed slightly lower pass rates (76.4% vs 77.5%) but higher well-formed case percentages (94.4% vs 86.5%) compared to Gemini 1.5 PRO. The Flash model also demonstrated superior cost-effectiveness and speed, making it an attractive option for many use cases.
Conclusion:
CEDARScript has shown significant improvements for AI-assisted code refactoring.
By improving cost-savings, accuracy, efficiency, and reliability across different models, it addresses many of the challenges associated with traditional diff-based formats.
The consistent performance boost across various metrics indicates that CEDARScript could be an important enabler for AI models to handle complex code transformations more effectively.
These results could have positive implications for developer productivity, code quality, and the future of AI-assisted software development.