[INFO] fetching crate smoleval 0.2.0...
[INFO] testing smoleval-0.2.0 against 1.95.0 for beta-1.96-2
[INFO] extracting crate smoleval 0.2.0 into /workspace/builds/worker-1-tc1/source
[INFO] started tweaking crates.io crate smoleval 0.2.0
[INFO] removed 0 missing tests
[INFO] finished tweaking crates.io crate smoleval 0.2.0
[INFO] tweaked toml for crates.io crate smoleval 0.2.0 written to /workspace/builds/worker-1-tc1/source/Cargo.toml
[INFO] validating manifest of crates.io crate smoleval 0.2.0 on toolchain 1.95.0
[INFO] running `Command { std: CARGO_HOME="/workspace/cargo-home" RUSTUP_HOME="/workspace/rustup-home" "/workspace/cargo-home/bin/cargo" "+1.95.0" "metadata" "--manifest-path" "Cargo.toml" "--no-deps", kill_on_drop: false }`
[INFO] crate crates.io crate smoleval 0.2.0 already has a lockfile, it will not be regenerated
[INFO] running `Command { std: CARGO_HOME="/workspace/cargo-home" RUSTUP_HOME="/workspace/rustup-home" "/workspace/cargo-home/bin/cargo" "+1.95.0" "fetch" "--manifest-path" "Cargo.toml", kill_on_drop: false }`
[INFO] [stderr]  Downloading crates ...
[INFO] [stderr]   Downloaded aws-lc-rs v1.16.1
[INFO] [stderr]   Downloaded aws-lc-sys v0.38.0
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc1/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc1/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+1.95.0" "metadata" "--no-deps" "--format-version=1", kill_on_drop: false }`
[INFO] [stdout] 73868e819149a84fced286900a7c36f77d01d5764b079c315cb6fc7dfe757679
[INFO] running `Command { std: "docker" "start" "-a" "73868e819149a84fced286900a7c36f77d01d5764b079c315cb6fc7dfe757679", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "inspect" "73868e819149a84fced286900a7c36f77d01d5764b079c315cb6fc7dfe757679", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "73868e819149a84fced286900a7c36f77d01d5764b079c315cb6fc7dfe757679", kill_on_drop: false }`
[INFO] [stdout] 73868e819149a84fced286900a7c36f77d01d5764b079c315cb6fc7dfe757679
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc1/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc1/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=warn" "-e" "RUSTDOCFLAGS=--cap-lints=warn" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+1.95.0" "build" "--frozen" "--message-format=json", kill_on_drop: false }`
[INFO] [stdout] eac500e2f1dc1ed1c618e13370c4481c507edead7c495abfa4304026cbb6198d
[INFO] running `Command { std: "docker" "start" "-a" "eac500e2f1dc1ed1c618e13370c4481c507edead7c495abfa4304026cbb6198d", kill_on_drop: false }`
[INFO] [stderr]    Compiling aws-lc-rs v1.16.1
[INFO] [stderr]    Compiling futures-macro v0.3.32
[INFO] [stderr]    Compiling aws-lc-sys v0.38.0
[INFO] [stderr]    Compiling zerotrie v0.2.3
[INFO] [stderr]    Compiling tinystr v0.8.2
[INFO] [stderr]    Compiling rustls v0.23.37
[INFO] [stderr]    Compiling icu_collections v2.1.1
[INFO] [stderr]    Compiling tracing v0.1.44
[INFO] [stderr]    Compiling serde v1.0.228
[INFO] [stderr]    Compiling serde_json v1.0.149
[INFO] [stderr]    Compiling h2 v0.4.13
[INFO] [stderr]    Compiling icu_locale_core v2.1.1
[INFO] [stderr]    Compiling thiserror-impl v2.0.18
[INFO] [stderr]    Compiling futures-util v0.3.32
[INFO] [stderr]    Compiling serde_yaml v0.9.34+deprecated
[INFO] [stderr]    Compiling icu_provider v2.1.1
[INFO] [stderr]    Compiling thiserror v2.0.18
[INFO] [stderr]    Compiling icu_normalizer v2.1.1
[INFO] [stderr]    Compiling icu_properties v2.1.2
[INFO] [stderr]    Compiling idna_adapter v1.2.1
[INFO] [stderr]    Compiling hyper v1.8.1
[INFO] [stderr]    Compiling idna v1.1.0
[INFO] [stderr]    Compiling url v2.5.8
[INFO] [stderr]    Compiling tower v0.5.3
[INFO] [stderr]    Compiling futures-executor v0.3.32
[INFO] [stderr]    Compiling hyper-util v0.1.20
[INFO] [stderr]    Compiling futures v0.3.32
[INFO] [stderr]    Compiling tower-http v0.6.8
[INFO] [stderr]    Compiling rustls-webpki v0.103.9
[INFO] [stderr]    Compiling tokio-rustls v0.26.4
[INFO] [stderr]    Compiling rustls-platform-verifier v0.6.2
[INFO] [stderr]    Compiling hyper-rustls v0.27.7
[INFO] [stderr]    Compiling reqwest v0.13.2
[INFO] [stderr]    Compiling smoleval v0.2.0 (/opt/rustwide/workdir)
[INFO] [stderr]     Finished `dev` profile [unoptimized + debuginfo] target(s) in 1m 26s
[INFO] running `Command { std: "docker" "inspect" "eac500e2f1dc1ed1c618e13370c4481c507edead7c495abfa4304026cbb6198d", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "eac500e2f1dc1ed1c618e13370c4481c507edead7c495abfa4304026cbb6198d", kill_on_drop: false }`
[INFO] [stdout] eac500e2f1dc1ed1c618e13370c4481c507edead7c495abfa4304026cbb6198d
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc1/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc1/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=warn" "-e" "RUSTDOCFLAGS=--cap-lints=warn" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+1.95.0" "test" "--frozen" "--no-run" "--message-format=json", kill_on_drop: false }`
[INFO] [stdout] 1ce0eec8dc219d81bce4479f4317de5c57d78cca1263bd6c9da6aaf07765373d
[INFO] running `Command { std: "docker" "start" "-a" "1ce0eec8dc219d81bce4479f4317de5c57d78cca1263bd6c9da6aaf07765373d", kill_on_drop: false }`
[INFO] [stderr]    Compiling tokio-macros v2.6.1
[INFO] [stderr]    Compiling tokio v1.50.0
[INFO] [stderr]    Compiling tokio-util v0.7.18
[INFO] [stderr]    Compiling tokio-rustls v0.26.4
[INFO] [stderr]    Compiling tower v0.5.3
[INFO] [stderr]    Compiling h2 v0.4.13
[INFO] [stderr]    Compiling tower-http v0.6.8
[INFO] [stderr]    Compiling hyper v1.8.1
[INFO] [stderr]    Compiling hyper-util v0.1.20
[INFO] [stderr]    Compiling hyper-rustls v0.27.7
[INFO] [stderr]    Compiling reqwest v0.13.2
[INFO] [stderr]    Compiling smoleval v0.2.0 (/opt/rustwide/workdir)
[INFO] [stderr]     Finished `test` profile [unoptimized + debuginfo] target(s) in 30.57s
[INFO] running `Command { std: "docker" "inspect" "1ce0eec8dc219d81bce4479f4317de5c57d78cca1263bd6c9da6aaf07765373d", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "1ce0eec8dc219d81bce4479f4317de5c57d78cca1263bd6c9da6aaf07765373d", kill_on_drop: false }`
[INFO] [stdout] 1ce0eec8dc219d81bce4479f4317de5c57d78cca1263bd6c9da6aaf07765373d
[INFO] running `Command { std: "docker" "create" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc1/target:/opt/rustwide/target:rw,Z" "-v" "/var/lib/crater-agent-workspace/builds/worker-1-tc1/source:/opt/rustwide/workdir:ro,Z" "-v" "/var/lib/crater-agent-workspace/cargo-home:/opt/rustwide/cargo-home:ro,Z" "-v" "/var/lib/crater-agent-workspace/rustup-home:/opt/rustwide/rustup-home:ro,Z" "-e" "SOURCE_DIR=/opt/rustwide/workdir" "-e" "CARGO_TARGET_DIR=/opt/rustwide/target" "-e" "CARGO_INCREMENTAL=0" "-e" "RUST_BACKTRACE=full" "-e" "RUSTFLAGS=--cap-lints=warn" "-e" "RUSTDOCFLAGS=--cap-lints=warn" "-e" "CARGO_HOME=/opt/rustwide/cargo-home" "-e" "RUSTUP_HOME=/opt/rustwide/rustup-home" "-w" "/opt/rustwide/workdir" "-m" "1610612736" "--user" "0:0" "--network" "none" "ghcr.io/rust-lang/crates-build-env/linux@sha256:d429b63d4308055ea97f60fb1d3dfca48854a00942f1bd2ad806beaf015945ec" "/opt/rustwide/cargo-home/bin/cargo" "+1.95.0" "test" "--frozen", kill_on_drop: false }`
[INFO] [stdout] 8272e88fc441067903b9b5954008128d33a73fbe63ff61352c4df84a30f4e45c
[INFO] running `Command { std: "docker" "start" "-a" "8272e88fc441067903b9b5954008128d33a73fbe63ff61352c4df84a30f4e45c", kill_on_drop: false }`
[INFO] [stderr]     Finished `test` profile [unoptimized + debuginfo] target(s) in 0.28s
[INFO] [stderr]      Running unittests src/lib.rs (/opt/rustwide/target/debug/deps/smoleval-adac590990b3aa42)
[INFO] [stdout] 
[INFO] [stdout] running 93 tests
[INFO] [stdout] test agent::tests::agent_response_deserialize_no_tool_calls ... ok
[INFO] [stdout] test agent::tests::agent_response_missing_text_fails ... ok
[INFO] [stdout] test agent::tests::agent_response_serialize_roundtrip ... ok
[INFO] [stdout] test agent::tests::tool_call_deserialize_with_arguments ... ok
[INFO] [stdout] test agent::tests::tool_call_deserialize_without_arguments ... ok
[INFO] [stdout] test agent::tests::tool_call_serialize_roundtrip ... ok
[INFO] [stdout] test check::tests::check_result_build_boundary_one ... ok
[INFO] [stdout] test check::tests::check_result_build_invalid ... ok
[INFO] [stdout] test check::tests::check_result_build_boundary_zero ... ok
[INFO] [stdout] test check::tests::check_result_build_valid ... ok
[INFO] [stdout] test check::tests::check_result_fail ... ok
[INFO] [stdout] test check::tests::check_result_pass ... ok
[INFO] [stdout] test check::tests::check_result_reason_preserved ... ok
[INFO] [stdout] test check::tests::check_spec_deserialize ... ok
[INFO] [stdout] test check::tests::contains_all_empty_values ... ok
[INFO] [stdout] test check::tests::contains_all_fail ... ok
[INFO] [stdout] test check::tests::contains_all_invalid_config ... ok
[INFO] [stdout] test check::tests::contains_all_pass ... ok
[INFO] [stdout] test check::tests::contains_any_case_sensitive ... ok
[INFO] [stdout] test check::tests::contains_any_empty_values ... ok
[INFO] [stdout] test check::tests::contains_any_fail ... ok
[INFO] [stdout] test check::tests::contains_any_pass ... ok
[INFO] [stdout] test check::tests::exact_match_empty_string ... ok
[INFO] [stdout] test check::tests::contains_all_case_sensitive ... ok
[INFO] [stdout] test check::tests::exact_match_fail ... ok
[INFO] [stdout] test check::tests::exact_match_pass ... ok
[INFO] [stdout] test check::tests::not_contains_case_sensitive ... ok
[INFO] [stdout] test check::tests::not_contains_empty_values ... ok
[INFO] [stdout] test check::tests::not_contains_fail ... ok
[INFO] [stdout] test check::tests::not_contains_pass ... ok
[INFO] [stdout] test check::tests::registry_builtins_resolve ... ok
[INFO] [stdout] test check::tests::registry_custom_check ... ok
[INFO] [stdout] test check::tests::registry_default_is_empty ... ok
[INFO] [stdout] test check::tests::registry_empty_cannot_create ... ok
[INFO] [stdout] test check::tests::registry_tool_used_at_least_from_config ... ok
[INFO] [stdout] test check::tests::registry_tools_used_in_order_from_config ... ok
[INFO] [stdout] test check::tests::registry_unknown_type ... ok
[INFO] [stdout] test check::tests::tool_used_at_least_fail_insufficient ... ok
[INFO] [stdout] test check::tests::tool_used_at_least_fail_not_present ... ok
[INFO] [stdout] test check::tests::tool_used_at_least_pass_default_times ... ok
[INFO] [stdout] test check::tests::tool_used_at_least_pass_multiple ... ok
[INFO] [stdout] test check::tests::tool_used_at_least_with_params_fail ... ok
[INFO] [stdout] test check::tests::tool_used_at_least_with_params_pass ... ok
[INFO] [stdout] test check::tests::tool_used_exactly_config_requires_times ... ok
[INFO] [stdout] test check::tests::tool_used_at_most_pass_exact ... ok
[INFO] [stdout] test check::tests::tool_used_at_most_fail ... ok
[INFO] [stdout] test check::tests::tool_used_exactly_fail_too_many ... ok
[INFO] [stdout] test check::tests::tool_used_exactly_zero_pass ... ok
[INFO] [stderr]      Running tests/integration.rs (/opt/rustwide/target/debug/deps/integration-6c9a1b2f4478f2fe)
[INFO] [stdout] test check::tests::tool_used_exactly_fail_too_few ... ok
[INFO] [stdout] test check::tests::tool_used_exactly_pass ... ok
[INFO] [stdout] test check::tests::tools_used_in_order_fail_missing ... ok
[INFO] [stdout] test check::tests::tools_used_in_order_pass_exact ... ok
[INFO] [stdout] test check::tests::tools_used_in_order_fail_wrong_order ... ok
[INFO] [stdout] test check::tests::tools_used_in_order_empty_passes ... ok
[INFO] [stdout] test check::tests::tools_used_in_order_pass_with_extras ... ok
[INFO] [stdout] test check::tests::tools_used_in_order_repeated ... ok
[INFO] [stdout] test dataset::tests::from_file_nonexistent ... ok
[INFO] [stdout] test dataset::tests::check_spec_preserves_config ... ok
[INFO] [stdout] test dataset::tests::parse_empty_tests_list ... ok
[INFO] [stdout] test dataset::tests::parse_duplicate_test_names ... ok
[INFO] [stdout] test dataset::tests::parse_full_dataset ... ok
[INFO] [stdout] test agent::tests::agent_response_deserialize_full ... ok
[INFO] [stdout] test check::tests::tool_used_at_most_pass_zero_calls ... ok
[INFO] [stdout] test check::tests::tool_used_at_most_with_params ... ok
[INFO] [stdout] test dataset::tests::parse_minimal_dataset ... ok
[INFO] [stdout] test dataset::tests::parse_missing_required_field_name ... ok
[INFO] [stdout] test dataset::tests::parse_invalid_yaml ... ok
[INFO] [stdout] test dataset::tests::parse_missing_required_field_tests ... ok
[INFO] [stdout] test dataset::tests::parse_missing_required_field_prompt ... ok
[INFO] [stdout] test error::tests::agent_error_display ... ok
[INFO] [stdout] test error::tests::check_config_display ... ok
[INFO] [stdout] test error::tests::dataset_io_error_display ... ok
[INFO] [stdout] test error::tests::unknown_check_display ... ok
[INFO] [stdout] test error::tests::yaml_error_converts ... ok
[INFO] [stdout] test dataset::tests::serialize_roundtrip ... ok
[INFO] [stdout] test eval::tests::eval_report_empty ... ok
[INFO] [stdout] test error::tests::invalid_score_display ... ok
[INFO] [stdout] test error::tests::io_error_converts ... ok
[INFO] [stdout] test eval::tests::eval_report_mixed ... ok
[INFO] [stdout] test eval::tests::eval_report_all_pass ... ok
[INFO] [stdout] test eval::tests::evaluate_agent_error_propagates ... ok
[INFO] [stdout] test eval::tests::evaluate_preserves_response ... ok
[INFO] [stdout] test eval::tests::mean_score_mixed ... ok
[INFO] [stdout] test eval::tests::mean_score_partial ... ok
[INFO] [stdout] test eval::tests::evaluate_empty_dataset ... ok
[INFO] [stdout] test eval::tests::mean_score_single_pass ... ok
[INFO] [stdout] test eval::tests::evaluate_no_checks_scores_one ... ok
[INFO] [stdout] test eval::tests::run_checks_multiple_mixed ... ok
[INFO] [stdout] test eval::tests::mean_score_empty ... ok
[INFO] [stdout] test eval::tests::run_checks_single_passing ... ok
[INFO] [stdout] test eval::tests::mean_score_single_fail ... ok
[INFO] [stdout] test eval::tests::run_checks_no_checks ... ok
[INFO] [stdout] test eval::tests::run_checks_unknown_type_errors ... ok
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 93 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
[INFO] [stdout] 
[INFO] [stdout] 
[INFO] [stdout] running 28 tests
[INFO] [stdout] test contains_all_case_sensitive ... ok
[INFO] [stdout] test contains_any_fails_when_none_match ... ok
[INFO] [stdout] test evaluate_concurrent_captures_agent_errors ... ok
[INFO] [stdout] test evaluate_fail_fast_aborts_on_agent_error ... ok
[INFO] [stdout] test evaluate_concurrent_produces_same_results ... ok
[INFO] [stdout] test custom_check_registration ... ok
[INFO] [stdout] test evaluate_no_fail_fast_captures_errors ... ok
[INFO] [stdout] test evaluate_unknown_check_captured_without_fail_fast ... ok
[INFO] [stdout] test evaluate_unknown_check_fails_fast ... ok
[INFO] [stdout] test evaluation_preserves_test_case_metadata ... ok
[INFO] [stdout] test multiple_custom_checks_in_one_test ... ok
[INFO] [stdout] test load_yaml_and_evaluate ... ok
[INFO] [stdout] test on_result_callback_invoked_for_each_test ... ok
[INFO] [stdout] test parse_yaml_from_string ... ok
[INFO] [stdout] test partial_score_with_mixed_checks ... ok
[INFO] [stdout] test preflight_invalid_config_caught ... ok
[INFO] [stdout] test preflight_collects_multiple_errors ... ok
[INFO] [stdout] test preflight_catches_unknown_check_before_agent_runs ... ok
[INFO] [stdout] test test_case_labels_match_scores ... ok
[INFO] [stdout] test preflight_valid_dataset_runs_normally ... ok
[INFO] [stdout] test tool_used_at_least_with_zero_times ... ok
[INFO] [stdout] test tool_used_exactly_fail_wrong_tool ... ok
[INFO] [stdout] test tool_used_exactly_pass ... ok
[INFO] [stdout] test report_has_nonzero_duration ... ok
[INFO] [stdout] test validate_dataset_standalone ... ok
[INFO] [stdout] test report_metrics_with_mixed_outcomes ... ok
[INFO] [stdout] test response_not_contains_fails_when_present ... ok
[INFO] [stdout] test on_result_callback_invoked_concurrently ... ok
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 28 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
[INFO] [stdout] 
[INFO] [stderr]    Doc-tests smoleval
[INFO] [stdout] 
[INFO] [stdout] running 0 tests
[INFO] [stdout] 
[INFO] [stdout] test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
[INFO] [stdout] 
[INFO] running `Command { std: "docker" "inspect" "8272e88fc441067903b9b5954008128d33a73fbe63ff61352c4df84a30f4e45c", kill_on_drop: false }`
[INFO] running `Command { std: "docker" "rm" "-f" "8272e88fc441067903b9b5954008128d33a73fbe63ff61352c4df84a30f4e45c", kill_on_drop: false }`
[INFO] [stdout] 8272e88fc441067903b9b5954008128d33a73fbe63ff61352c4df84a30f4e45c
