At a glance
Mean absolute error (MAE) of MoleBench against experiment, across the benchmark sets below:
| Property | Method | MAE | Verdict |
|---|---|---|---|
| Bond lengths | GFN2-xTB geometry | ≈ 0.007 Å | excellent |
| Dipole moment | GFN2-xTB | ≈ 0.22 D | good (slight over-prediction) |
| ¹H chemical shift | GIAO B3LYP/6-31G* | ≈ 0.15 ppm | good |
| ¹³C chemical shift | GIAO B3LYP/6-31G* | ≈ 2 ppm* | good |
| pKa | GFN2 ΔG + per-class scaling | ≈ 0.3 units | good (per functional group) |
| UV-Vis λmax | TD-B3LYP/6-31G* | ≈ 5 nm | good (basis-dependent) |
*excluding one gas-phase carboxylic-acid outlier discussed below. All calculations run on this site; you can reproduce any of them in the Studio.
The honest one-liner: MoleBench is excellent for geometry and trends, and good and quantitative for NMR, dipoles and (after a per-class re-fit) pKa. Use it to understand and compare — and reach for the literature when you need a publication number.
Geometry — GFN2-xTB bond lengths
Getting the shape right is the foundation of everything else, and here MoleBench's default engine is genuinely strong: bond lengths land within about a hundredth of an ångström of experiment.
| Molecule | Bond | MoleBench (Å) | Experiment (Å) | Δ |
|---|---|---|---|---|
| water | O–H | 0.959 | 0.958 | +0.001 |
| methane | C–H | 1.082 | 1.087 | −0.005 |
| ethane | C–C | 1.522 | 1.535 | −0.013 |
| ethane | C–H | 1.088 | 1.094 | −0.006 |
| benzene | C–C | 1.385 | 1.397 | −0.012 |
| benzene | C–H | 1.080 | 1.084 | −0.004 |
Dipole moments — GFN2-xTB
The dipole tests the electronic structure, not just the shape. Trends are captured perfectly (the non-polar molecules come out at zero; the most polar comes out most polar), with a mild systematic over-prediction of a few tenths of a Debye on the carbonyls.
| Molecule | MoleBench (D) | Experiment (D) | Δ |
|---|---|---|---|
| benzene | 0.00 | 0.00 | 0.00 |
| formaldehyde | 2.33 | 2.33 | 0.00 |
| acetonitrile | 3.85 | 3.92 | −0.07 |
| chloromethane | 2.04 | 1.87 | +0.17 |
| methanol | 1.92 | 1.70 | +0.22 |
| dimethyl ether | 1.58 | 1.30 | +0.28 |
| ammonia | 1.78 | 1.47 | +0.31 |
| water | 2.22 | 1.85 | +0.37 |
| acetone | 3.42 | 2.88 | +0.54 |
¹³C NMR — GIAO B3LYP/6-31G* (Advanced tier)
The quantum NMR is calibrated against experiment and performs well across a 200-ppm range, from a shielded methyl to a deshielded carbonyl.
| Molecule | Carbon | MoleBench (ppm) | Experiment (ppm) | Δ |
|---|---|---|---|---|
| benzene | CH | 128.7 | 128.5 | +0.2 |
| acetone | C=O | 205.7 | 206.0 | −0.3 |
| acetone | CH₃ | 28.2 | 30.9 | −2.7 |
| toluene | C1 (ipso) | 139.2 | 137.8 | +1.4 |
| toluene | C2–C6 (avg) | 128.1 | 127.8 | +0.3 |
| toluene | CH₃ | 22.3 | 21.4 | +0.9 |
| methanol | CH₃ | 53.1 | 50.4 | +2.7 |
| ethanol | CH₂ | 61.9 | 58.4 | +3.5 |
| acetic acid | C=O | 168.7 | 178.1 | −9.4 |
¹H NMR — GIAO B3LYP/6-31G*
| Molecule | Proton | MoleBench (ppm) | Experiment (ppm) | Δ |
|---|---|---|---|---|
| ethanol | CH₃ | 1.16 | 1.21 | −0.05 |
| toluene | CH₃ | 2.32 | 2.34 | −0.02 |
| acetone | CH₃ | 1.95 | 2.09 | −0.14 |
| benzene | ArH | 7.12 | 7.26 | −0.14 |
| acetic acid | CH₃ | 1.93 | 2.10 | −0.17 |
| methanol | CH₃ | 3.66 | 3.40 | +0.26 |
| ethanol | CH₂ | 3.96 | 3.69 | +0.27 |
O–H / N–H protons are omitted: they are dominated by hydrogen bonding and concentration, so a gas-phase value is not comparable to a solution measurement.
pKa — GFN2 deprotonation + per-class calibration
This one has a story. An earlier single global calibration carried a +1.5–2 unit high bias on carboxylic acids — which this very benchmark exposed. The cause: the GFN2 deprotonation energy maps to pKa with a class-dependent slope (carboxylic acids, phenols and alcohols each follow a different line), so no single line can fit them all. We re-fit per functional-group class against a 20-acid set spanning pKa 0–17. The bias is gone, and the error dropped to ~0.3 units:
| Acid | Class | MoleBench | Experiment | Δ |
|---|---|---|---|---|
| trifluoroacetic acid | acid | 0.1 | 0.23 | −0.1 |
| formic acid | acid | 3.3 | 3.75 | −0.5 |
| acetic acid | acid | 4.7 | 4.76 | −0.1 |
| benzoic acid | acid | 4.6 | 4.20 | +0.4 |
| p-nitrophenol | phenol | 6.4 | 7.15 | −0.8 |
| phenol | phenol | 10.1 | 9.99 | +0.1 |
| thiophenol | thiol | 6.8 | 6.62 | +0.2 |
| ethanol | alcohol | 16.1 | 16.0 | +0.1 |
| phosphoric acid | P-oxyacid | 2.0 | 2.15 | −0.2 |
| methanesulfonic acid | S-oxyacid | −2.0 | −1.9 | −0.1 |
Held out from the calibration set, then predicted blind: 2-naphthol 9.8 (exp 9.51) and propanoic acid 4.7 (exp 4.87) — so it generalizes, it isn't memorizing. Acidity ranking across 17 units is reliable, and absolute values are now good to roughly ±0.5 unit for the common classes.
Two honest edges, now flagged in the tool itself. Phosphorus/sulfur oxyacids (phosphoric, phosphonic, sulfonic) were originally mis-scored as alcohols (phosphoric came out ~8 instead of ~2); they now use dedicated P- and S-oxyacid classes and carry an "approximate, strong acid" note. Amino acids are detected and labelled: the gas-phase neutral model can't form the zwitterion that dominates in water, so glycine's −COOH reads ≈3.9 rather than the measured ≈2.35 — the tool now says so up front instead of quietly handing you the wrong number.
UV-Vis & why the basis set matters
UV-Vis is a great lesson in how method choice drives accuracy. The strong π→π* absorption of paracetamol (experimental λmax ≈ 243–249 nm) marches steadily toward experiment as the basis set improves:
| Basis set | Predicted λmax | vs exp (~244 nm) |
|---|---|---|
| STO-3G (minimal) | 212 nm | −32 |
| 3-21G (Quick tier) | 235 nm | −9 |
| 6-31G* (Advanced tier) | 240 nm | −4 |
| 6-31+G* (diffuse) | 246 nm | +2 |
This is why the Studio's Quick UV uses 3-21G and Advanced uses 6-31G*. TD-DFT is reliable for ordinary valence (π→π*, n→π*) excitations but should not be trusted for charge-transfer or Rydberg states.
The instant (empirical) NMR — and its honest limits
The Quick NMR returns shifts in milliseconds using substituent-additivity rules. For the chemistry it was built for it is remarkably good — but it knows its limits, and now tells you so.
| Molecule | Works well? | Why |
|---|---|---|
| aspirin, paracetamol, toluene | yes (±~2–3 ppm) | substituted benzenes + carbonyls + simple aliphatics — its sweet spot |
| pyridine, furan (heteroaromatic) | no | additivity has no good base values; flagged "low confidence" |
| caffeine (fused rings) | no | fused/heteroaromatic; flagged, with a "Run Advanced" button |
When the instant estimate is unreliable, MoleBench shows a warning and offers to run the real quantum calculation instead — so a fast estimate never masquerades as a trustworthy one. And the Advanced (quantum) tier genuinely handles them — here it is on the very heteroaromatics the instant tier flags, within ~2–3 ppm of experiment:
| Molecule | Carbon | Advanced (QM) | Experiment | Δ |
|---|---|---|---|---|
| pyridine | C2/C6 | 151.8 | 149.9 | +1.9 |
| pyridine | C4 | 136.3 | 136.0 | +0.3 |
| pyridine | C3/C5 | 124.2 | 123.8 | +0.4 |
| furan | C2/C5 | 141.3 | 142.8 | −1.5 |
| furan | C3/C4 | 111.6 | 109.6 | +2.0 |
| thiophene | C2/C5 | 124.8 | 125.4 | −0.6 |
| thiophene | C3/C4 | 129.9 | 127.2 | +2.7 |
The two tiers are complementary by design: the instant estimate for speed on its sweet spot, the quantum calculation for the cases it can't reach — and the tool always tells you which one you should be using.
How to read this
- Trends are more reliable than absolutes. Comparing two similar molecules with the same method cancels most systematic error — relative answers are the safest use of any of these tools.
- The model matters as much as the method. Several of the larger errors above are gas-phase vs. solution effects, not the quantum chemistry being wrong.
- Pick the right tier. Quick tiers are for speed and exploration; Advanced (quantum) tiers are for the numbers you'll quote.
- Everything here is reproducible. Build any of these molecules in the Studio and run the same calculation yourself.
Open the Studio → Read the Lecture Notes
Benchmark run on the live MoleBench compute service. Experimental values from standard reference compilations (CRC Handbook, NIST, SDBS and the primary literature). Updated 2026-06-25.