July 18, 2024
We asked 6 leading AI chatbots the following question: “How many squash courts are there at the Alfond Athletic Center Waterville, ME 04901, USA”
Here are their (abbreviated) answers:
- GTP-4o: 5
- Claude 3.5 Sonnet: 8
- Gemini 1.5 Pro: unknown
- Llama 3: 6
- Perplexity Sona: 1
- Mixtral 8x7B: 6
Not very helpful!
How Many Courts Does The Alfond Centre Actually Have?
It turns out that there are TWO athletic centres in Waterville, Maine, USA with very similar names.
The old/original Alfond Athletic Center Waterville, ME 04901, USA which (we think?!) has SIX courts. If true, Llama 3 and Mixtral 8x7B were correct.
The new Harold Alfond Athletics and Recreation Centre, 4900 Mayflower Hill Dr, Waterville, ME 04901, United States which has NINE courts.
Why The Variance Between Chatbots?
- Data Sources and Updates:
- Different chatbots access various databases and sources of information. Some might rely on outdated data, while others may have more current or accurate databases.
- For example, GTP-4o might use a database that says there are 5 courts, while Claude 3.5 Sonnet might have access to more recent information indicating there are 8 courts.
- Training Data:
- The chatbots are trained on different datasets and methods. The variance in the training data quality and recency can lead to different answers.
- If a chatbot’s training data included a specific update about the squash courts, it would reflect that in its answer, unlike a chatbot with older or less comprehensive training data.
- Query Interpretation:
- The way a chatbot interprets the query can vary. Some might look for specific information about the Alfond Athletic Center, while others might not accurately pinpoint the location or type of facility.
- Gemini 1.5 Pro’s response of “unknown” suggests it might not have interpreted the query correctly or found relevant data.
- Algorithm Differences:
- The underlying algorithms and their approach to retrieving and processing information differ. Some might be better at finding precise data, while others might provide general or inaccurate answers.
- Llama 3 and Mixtral 8x7B both say 6, possibly indicating they use a similar retrieval mechanism or database.
Value and Learning from the Results
The variable results do not mean the chatbots are worthless. Instead, they highlight several aspects:
- Cross-Verification:
- Having multiple answers allows users to cross-verify and understand that information can vary. This encourages looking at multiple sources for accuracy.
- When two or more chatbots agree, it can increase confidence in that answer. Here, both Llama 3 and Mixtral 8x7B agreeing on 6 courts might suggest it’s a reliable number.
- One could create an average of the results from the chatbots, compare this against existing data, and show the % variance. The bigger the variance, the more likely that the existing data is wrong.
- Prompt Refinement:
- Variance in results teaches users how to refine their prompts for better accuracy. More specific prompts can help narrow down the answers.
- For example, specifying “as of 2024” or “the most recent data” can potentially yield more accurate results.
Improving the Prompt
To get more reliable results, you can refine the prompt by:
- Adding Context:
- “How many squash courts are there at the Alfond Athletic Center in Waterville, ME 04901, USA as of the latest update in 2024?”
- Specific Questions:
- “Can you provide the most recent count of squash courts at the Alfond Athletic Center in Waterville, ME?”
- Clarifying Timeframe:
- “How many squash courts are currently available at the Alfond Athletic Center in Waterville, ME, based on the latest available data?”
Improvement Over Time
Chatbots are expected to improve through:
- Better Data Integration:
- Integrating more current and diverse data sources will lead to more accurate responses.
- Advanced Algorithms:
- Ongoing improvements in AI algorithms will enhance the ability to interpret queries correctly and retrieve precise information.
- User Feedback and Training:
- Continuous learning from user interactions and feedback will help fine-tune responses.
- Regular Updates:
- Regular updates to the chatbot’s knowledge base will ensure more accurate and up-to-date information.
In conclusion, while the variance in chatbot responses can be confusing, it underscores the importance of critical evaluation of information. By refining queries and understanding the evolving nature of AI, users can harness these tools more effectively over time.
Until there’s very significant improvement in AI technology (don’t hold your breath!), it’s crowdsourcing (us humans) all the way!