jump to search:/  select search:K  navigate results:/  navigate suggestions:/  close suggestions:esc

    Shared weblinks

    May 15th, 2023
    This page lists shared weblinks.

    September 19, 2023

    scoreless peer review
    Stuart Schechter’s How You Can Help Fix Peer Review on Sep 19, 2023

    When we scrutinize our students’ and colleagues’ research work to catch errors, offer clarifications, and suggest other ways to improve their work, we are informally conducting author-assistive peer review. Author-assistive review is almost always a * scoreless*, as scores serve no purpose even for work being prepared for publication review.

    Alas, the social norm of offering author-assistive review only to those close to us, and reviewing most everyone else’s work through publication review, exacerbates the disadvantages faced by underrepresented groups and other outsiders.

    [ . . . ]

    We can address those unintended harms by making ourselves at least as available for scoreless author-assistive peer review as we are for publication review.7

    Tags: peer review

    ssr: uses in new instruct model v. chat models?
    @simonw via Twitter on Sep 19, 2023

    Anyone seen any interesting examples of things this new instruct model can do that are difficult to achieve using the chat models?

    Tags: social search request

    September 14, 2023

    [What does the f mean in printf]
    @brettsmth via Twitter on Sep 14, 2023

    Interesting that @replit Ghostwriter gave a better response than GPT-4 for a coding question. Ghostwriter has gotten noticeably better for me and I find myself using it more than GPT-4 for development

    @danielsgriffin via Twitter on Sep 14, 2023

    Oooh. This is a slippery one! Because both are right?

    They must assume/interpolate:
    What does the f [format specifier] [mean/stand for] in printf?
    What does the [letter] f [mean/stand for] in [the string] printf?

    Tags: end-user-comparison

    ssr: LLM libraries that can be installed cleanly on Python
    @simonw via Twitter on Sep 14, 2023

    Anyone got leads on good LLM libraries that can be installed cleanly on Python (on macOS but ideally Linux and Windows too) using “pip install X” from PyPI, without needing a compiler setup?

    I’m looking for the quickest and simplest way to call a language model from Python

    Tags: social search request

    September 12, 2023

    DAIR.AI's Prompt Engineering Guide
    Prompt Engineering Guide on Jun 6, 2023

    Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs).

    Researchers use prompt engineering to improve the capacity of LLMs on a wide range of common and complex tasks such as question answering and arithmetic reasoning. Developers use prompt engineering to design robust and effective prompting techniques that interface with LLMs and other tools.

    Prompt engineering is not just about designing and developing prompts. It encompasses a wide range of skills and techniques that are useful for interacting and developing with LLMs. It’s an important skill to interface, build with, and understand capabilities of LLMs. You can use prompt engineering to improve safety of LLMs and build new capabilities like augmenting LLMs with domain knowledge and external tools.

    Motivated by the high interest in developing with LLMs, we have created this new prompt engineering guide that contains all the latest papers, learning guides, models, lectures, references, new LLM capabilities, and tools related to prompt engineering.


    1. Reddy (1979):

      Human communication will almost always go astray unless real energy is expended.



    Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2023). ReAct: Synergizing reasoning and acting in language models. http://arxiv.org/abs/2210.03629

    Tags: prompt engineering

    September 8, 2023

    ragas metrics


    Ragas measures your pipeline’s performance against different dimensions

    1. Faithfulness: measures the information consistency of the generated answer against the given context. If any claims are made in the answer that cannot be deduced from context is penalized.

    2. Context Relevancy: measures how relevant retrieved contexts are to the question. Ideally, the context should only contain information necessary to answer the question. The presence of redundant information in the context is penalized.

    3. Context Recall: measures the recall of the retrieved context using annotated answer as ground truth. Annotated answer is taken as proxy for ground truth context.

    4. Answer Relevancy: refers to the degree to which a response directly addresses and is appropriate for a given question or context. This does not take the factuality of the answer into consideration but rather penalizes the present of redundant information or incomplete answers given a question.

    5. Aspect Critiques: Designed to judge the submission against defined aspects like harmlessness, correctness, etc. You can also define your own aspect and validate the submission against your desired aspect. The output of aspect critiques is always binary.

    • ragas is mentioned in CSUIRx & in LLM frameworks

    • HT: Aaron Tay

    • I looked back at my comments on the OWASP . One concern I had there was:

      “inadequate informing” (wc?), where the information generated is accurate but inadequate given the situation-and-user.

      It doesn’t seem that these metrics directly engage with that, though aspect critiques could include it. I think this concerns pays more into what the ‘ground truth context’ is and how flexible these pipelines are for wildly different users asking the same strings of questions but hoping for and needing different responses. Perhaps I’m pondering something more like old-fashioned user relevance, which may be much more new and hot with generated responses.

    Tags: RAG

    September 1, 2023

    Search Smart FAQ on Sep 1, 2023

    Search Smart suggests the best databases for your purpose based on a comprehensive comparison of most of the popular English academic databases. Search Smart tests the critical functionalities databases offer. Thereby, we uncover the capabilities and limitations of search systems that are not reported anywhere else. Search Smart aims to provide the best – i.e., most accurate, up-to-date, and comprehensive – information possible on search systems’ functionalities.

    Researchers use Search Smart as a decision tool to select the system/database that fits best.

    Librarians use Search Smart for giving search advice and for procurement decisions.

    Search providers use Search Smart for benchmarking and improvement of their offerings.


    We defined a generic testing procedure that works across a diverse set of academic search systems - all with distinct coverages, functionalities, and features. Thus, while other testing methods would be available, we chose the best common denominator across a heterogenic landscape of databases. This way, we can test a substantially greater number of databases compared to already existing database overviews.

    We test the functionalities of specific capabilities search systems have or claim to have. Here we follow a routine that is called “metamorphic testing”. It is a way of testing hard-to-test systems such as artificial intelligence, or databases. A group of researchers titled their 2020 IEEE article “Metamorphic Testing: Testing the Untestable”. Using this logic, we test databases and systems that do not provide access to their systems.

    Metamorphic testing is always done from the perspective of the user. It investigates how well a system performs, not at some theoretical level, but in practice - how well can the user search with a system? Do the results add up? What are the limitations of certain functionalities?


    Goldenfein, J., & Griffin, D. (2022). Google scholar – platforming the scholarly economy. Internet Policy Review, 11(3), 117. https://doi.org/10.14763/2022.3.1671

    Gusenbauer, M., & Haddaway, N. R. (2019). Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of google scholar, pubmed and 26 other resources. Research Synthesis Methods. https://doi.org/10.1002/jrsm.1378

    Segura, S., Towey, D., Zhou, Z. Q., & Chen, T. Y. (2020). Metamorphic testing: Testing the untestable. IEEE Software, 37(3), 46–53. https://doi.org/10.1109/MS.2018.2875968

    Tags: evaluating search engines

    August 31, 2023

    "We really need to talk more about monitoring search quality for public interest topics."
    Dave Guarino (website | Twitter; “the founding engineer (and then Director) of GetCalFresh.org at Code for America”)
    @allafarce via Twitter on Jan 16, 2020

    We really need to talk more about monitoring search quality for public interest topics.


    Arawjo, I., Vaithilingam, P., Swoopes, C., Wattenberg, M., & Glassman, E. (2023). ChainForge. https://www.chainforge.ai/.

    Guendelman, S., Pleasants, E., Cheshire, C., & Kong, A. (2022). Exploring google searches for out-of-clinic medication abortion in the united states during 2020: Infodemiology approach using multiple samples. JMIR Infodemiology, 2(1), e33184. https://doi.org/10.2196/33184

    Lurie, E., & Mulligan, D. K. (2021). Searching for representation: A sociotechnical audit of googling for members of U.S. Congress. https://arxiv.org/abs/2109.07012

    Mejova, Y., Gracyk, T., & Robertson, R. (2022). Googling for abortion: Search engine mediation of abortion accessibility in the united states. JQD, 2. https://doi.org/10.51685/jqd.2022.007

    Mustafaraj, E., Lurie, E., & Devine, C. (2020). The case for voter-centered audits of search engines during political elections. FAT* ’20.

    Noble, S. U. (2018). Algorithms of oppression how search engines reinforce racism. New York University Press. https://nyupress.org/9781479837243/algorithms-of-oppression/

    Sundin, O., Lewandowski, D., & Haider, J. (2021). Whose relevance? Web search engines as multisided relevance machines. Journal of the Association for Information Science and Technology. https://doi.org/10.1002/asi.24570

    Urman, A., & Makhortykh, M. (2022). “Foreign beauties want to meet you”: The sexualization of women in google’s organic and sponsored text search results. New Media & Society, 0(0), 14614448221099536. https://doi.org/10.1177/14614448221099536

    Urman, A., Makhortykh, M., & Ulloa, R. (2022). Auditing the representation of migrants in image web search results. Humanit Soc Sci Commun, 9(1), 5. https://doi.org/10.1057/s41599-022-01144-1

    Urman, A., Makhortykh, M., Ulloa, R., & Kulshrestha, J. (2022). Where the earth is flat and 9/11 is an inside job: A comparative algorithm audit of conspiratorial information in web search results. Telematics and Informatics, 72, 101860. https://doi.org/10.1016/j.tele.2022.101860

    Zade, H., Wack, M., Zhang, Y., Starbird, K., Calo, R., Young, J., & West, J. D. (2022). Auditing google’s search headlines as a potential gateway to misleading content. Jots, 1(4). https://doi.org/10.54501/jots.v1i4.72

    August 30, 2023

    "The robot is not, in my opinion, a skip."
    @mattbeane via Twitter on Aug 30, 2023

    I came across this in my dissertation today. It stopped me in my tracks.

    Most studies show robotic surgery gets equivalent outcomes to traditional surgery. You read data like this and you wonder about how much skill remains under the hood in the profession…

    The word 'skip' is highlighted in the sentence: The robot is not, in my opinion, a skip. The full paragraph of text: It's not the same as doing a weekend course with intuitive surgical and then saying you're a robotic surgeon and now offering it at your hospital [italics indicate heavy emphasis]. I did 300 and something cases a as a fellow on the robot and 300 and something cases laparoscopically. So a huuuge difference in the level of skill set since I was operating four days a week as opposed to the guy who's offering robotic surgery of surgery and does it twice a month, okay? The way I was trained, and the way I train my residents, my fellows and the people I train at the national level is that you need to know how to do a procedure laparoscopically first before you'd tackle it robotically. The robot is not, in my opinion, a skip. You don't jump from open to robot, although that is exactly what has happened in the last five years. For the vast majority, and it's a marketing, money issue driven by Intuitive. No concern for patient care. And unfortunately, the surgeons who don't have the laparoscopic training who have been working for 10 to 15 years - panic, because they're like "I can't do minimally invasive surgery, maybe I can do it with the robot." Right? And then that'll help with marketing and it's a money thing, so you're no longer thinking about patient care it's now driven by money from Intuitive's. perspective and from the practice perspective. This is all a mistake. This is a huge fucking mistake. - AP


    Beane, M. (2017). Operating in the shadows: The productive deviance needed to make robotic surgery work [PhD thesis, MIT]. http://hdl.handle.net/1721.1/113956

    Microsoft CFP: "Accelerate Foundation Models Research"

    Note: “Foundation model” is another term for large language model (or LLM).

    Microsoft Research on Aug 24, 2023
    Accelerate Foundation Models Research

    …as industry-led advances in AI continue to reach new heights, we believe that a vibrant and diverse research ecosystem remains essential to realizing the promise of AI to benefit people and society while mitigating risks. Accelerate Foundation Models Research (AFMR) is a research grant program through which we will make leading foundation models hosted by Microsoft Azure more accessible to the academic research community via Microsoft Azure AI services.

    Potential research topics
    Align AI systems with human goals and preferences

    (e.g., enable robustness, sustainability, transparency, trustfulness, develop evaluation approaches)

    • How should we evaluate foundation models?
    • How might we mitigate the risks and potential harms of foundation models such as bias, unfairness, manipulation, and misinformation?
    • How might we enable continual learning and adaptation, informed by human feedback?
    • How might we ensure that the outputs of foundation models are faithful to real-world evidence, experimental findings, and other explicit knowledge?
    Advance beneficial applications of AI

    (e.g., increase human ingenuity, creativity and productivity, decrease AI digital divide)

    • How might we advance the study of the social and environmental impacts of foundation models?
    • How might we foster ethical, responsible, and transparent use of foundation models across domains and applications?
    • How might we study and address the social and psychological effects of large language models on human behavior, cognition, and emotion?
    • How can we develop AI technologies that are inclusive of everyone on the planet?
    • How might foundation models be used to enhance the creative process?
    Accelerate scientific discovery in the natural and life sciences

    (e.g., advanced knowledge discovery, causal understanding, generation of multi-scale multi-modal scientific data)

    • How might foundation models accelerate knowledge discovery, hypothesis generation and analysis workflows in natural and life sciences?
    • How might foundation models be used to transform scientific data interpretation and experimental data synthesis?
    • Which new scientific datasets are needed to train, fine-tune, and evaluate foundation models in natural and life sciences?
    • How might foundation models be used to make scientific data more discoverable, interoperable, and reusable?


    Hoffmann, A. L. (2021). Terms of inclusion: Data, discourse, violence. New Media & Society, 23(12), 3539–3556. https://doi.org/10.1177/1461444820958725

    Tags: CFP

    August 28, 2023

    caught myself having questions that I normally wouldn't bother
    @chrisalbon via Twitter on Aug 27, 2023

    Probably one of the best things I’ve done since ChatGPT/Copilot came out is create a “column” on the right side of my screen for them.

    I’ve caught myself having questions that I normally wouldn’t bother Googling but if since the friction is so low, I’ll ask of Copilot.

    [I am confused about this]
    @hyperdiscogirl via Twitter on Aug 27, 2023

    I was confused about someone’s use of an idiom so I went to google it but instead I googled “I am confused about this” and then stared at the results page, confused

    Tags: found-queries

    Tech Policy Press on Choosing Our Words Carefully


    This episode features two segments. In the first, Rebecca Rand speaks with Alina Leidinger, a researcher at the Institute for Logic, Language and Computation at the University of Amsterdam about her research– with coauthor Richard Rogers– into which stereotypes are moderated and under-moderated in search engine autocompletion. In the second segment, Justin Hendrix speaks with Associated Press investigative journalist Garance Burke about a new chapter in the AP Stylebook offering guidance on how to report on artificial intelligence.

    HTT: Alina Leidinger (website, Twitter)

    The paper in question: Leidinger & Rogers (2023)


    Warning: This paper contains content that may be offensive or upsetting.

    Language technologies that perpetuate stereotypes actively cement social hierarchies. This study enquires into the moderation of stereotypes in autocompletion results by Google, DuckDuckGo and Yahoo! We investigate the moderation of derogatory stereotypes for social groups, examining the content and sentiment of the autocompletions. We thereby demonstrate which categories are highly moderated (i.e., sexual orientation, religious affiliation, political groups and communities or peoples) and which less so (age and gender), both overall and per engine. We found that under-moderated categories contain results with negative sentiment and derogatory stereotypes. We also identify distinctive moderation strategies per engine, with Google and DuckDuckGo moderating greatly and Yahoo! being more permissive. The research has implications for both moderation of stereotypes in commercial autocompletion tools, as well as large language models in NLP, particularly the question of the content deserving of moderation.


    Leidinger, A., & Rogers, R. (2023). Which stereotypes are moderated and under-moderated in search engine autocompletion? Proceedings of the 2023 Acm Conference on Fairness, Accountability, and Transparency, 1049–1061. https://doi.org/10.1145/3593013.3594062

    Tags: to-look-at, search-autocomplete, artificial intelligence

    open source project named Quivr...
    @bradneuberg via Twitter on Aug 26, 2023

    Open source project named Quivr that indexes your local files on your machine & allows you to query them with large language models. I want something like this but directly integrated into my Macs Apple Notes + all my browser tabs & history, local on PC

    Tags: local-search

    August 22, 2023

    "And what matters is if it works."
    This is a comment about Kabir et al. (2023), following a theme in my research. @NektariosAI is replying to @GaryMarcus saying: “the study still confirms something I (and others) have been saying: people mistake the grammaticality etc of LLMs for truth.”
    @NektariosAI via Twitter on Aug 10, 2023

    I understand. But when it comes to coding, if it’s not true, it most likely won’t work. And what matters is if it works. Only a bad programmer will accept the answer without testing it. You may need a few rounds of prompting to get to the right answer and often it knows how to correct itself. It will also suggest other more efficient approaches.


    Kabir, S., Udo-Imeh, D. N., Kou, B., & Zhang, T. (2023). Who answers it better? An in-depth analysis of chatgpt and stack overflow answers to software engineering questions. http://arxiv.org/abs/2308.02312

    Widder, D. G., Nafus, D., Dabbish, L., & Herbsleb, J. D. (2022, June). Limits and possibilities for “ethical AI” in open source: A study of deepfakes. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. https://davidwidder.me/files/widder-ossdeepfakes-facct22.pdf

    Tags: treating information as atomic

    August 4, 2023

    Are prompts—& queries—not Lipschitz?
    @zacharylipton via Twitter on Aug 3, 2023

    Prompts are not Lipschitz. There are no “small” changes to prompts. Seemingly minor tweaks can yield shocking jolts in model behavior. Any change in a prompt-based method requires a complete rerun of evaluation, both automatic and human. For now, this is the way.


    Hora, A. (2021, May). Googling for software development: What developers search for and what they find. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). https://doi.org/10.1109/msr52588.2021.00044

    Lurie, E., & Mulligan, D. K. (2021). Searching for representation: A sociotechnical audit of googling for members of U.S. Congress. https://arxiv.org/abs/2109.07012

    Trielli, D., & Diakopoulos, N. (2018). Defining the role of user input bias in personalized platforms. Paper presented at the Algorithmic Personalization and News (APEN18) workshop at the International AAAI Conference on Web and Social Media (ICWSM). https://www.academia.edu/37432632/Defining_the_Role_of_User_Input_Bias_in_Personalized_Platforms

    Tripodi, F. (2018). Searching for alternative facts: Analyzing scriptural inference in conservative news practices. Data & Society. https://datasociety.net/output/searching-for-alternative-facts/

    Tags: prompt engineering

    August 3, 2023

    Keyword search is dead?
    Keyword search is dead?

    Perhaps we might rather say that other search modalities are now showing more signs of life? Though perhaps also distinguish keyword search from fulltext search or with reference to various ways searching is mediated (from stopwords to noindex and search query length limits) When is keyword search still particularly valuable? (Cmd/Ctrl+F is still very alive?) How does keyword search have a role in addressing hallucination?

    Surely though, one exciting thing about this moment is how much people are reimagining what search can be.
    @vectara via Twitter on Jun 15, 2023

    Keyword search is dead. Ask full questions in your own words and get the high-relevance results that you actually need.
    🔍 Top retrieval, summarization, & grounded generation
    😵‍💫 Eliminates hallucinations
    🧑🏽‍💻 Built for developers
    ⏩ Set up in 5 mins


    Burrell, J. (2016). How the machine “thinks”: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1), 2053951715622512. https://doi.org/10.1177/2053951715622512

    Duguid, P. (2012). The world according to grep: A progress from closed to open? 1–21. http://courses.ischool.berkeley.edu/i218/s12/Grep.pdf

    Tags: keyword search, hallucination, full questions, automation bias, opening-closing, opacity, musingful-memo

    OWASP Top 10 for Large Language Model Applications

    Here is the ‘OWASP Top 10 for Large Language Model Applications’. Overreliance is relevant to my research.

    (I’ve generally used the term “automation bias”, though perhaps a more direct term like overreliance is better.)

    You can see my discussion in the “Extending searching” chapter of my dissertation (particularly the sections on “Spaces for evaluation” and “Decoupling performance from search”) as I look at how data engineers appear to effectively address related risks in their heavy use of general-purpose web search at work. I’m very focused on how the searcher is situated and what they are doing well before and after they actually type in a query (or enter a prompt).

    Key lessons in my dissertation: (1) The data engineers are not really left to evaluate search results as they read them and assigning such responsibility could run into Meno’s Paradox (instead there are various tools, processes, and other people that assist in evaluation). (2) While search is a massive input into their work, it is not tightly coupled to their key actions (instead there are useful frictions (and perhaps fictions), gaps, and buffers).

    I’d like discussion explicitly addressing “inadequate informing” (wc?), where the information generated is accurate but inadequate given the situation-and-user.

    The section does refer to “inappropriate” content, but usage suggests “toxic” rather than insufficient or inadequate.

    OWASP on Aug 01, 2023

    The OWASP Top 10 for Large Language Model Applications project aims to educate developers, designers, architects, managers, and organizations about the potential security risks when deploying and managing Large Language Models (LLMs). The project provides a list of the top 10 most critical vulnerabilities often seen in LLM applications, highlighting their potential impact, ease of exploitation, and prevalence in real-world applications. Examples of vulnerabilities include prompt injections, data leakage, inadequate sandboxing, and unauthorized code execution, among others. The goal is to raise awareness of these vulnerabilities, suggest remediation strategies, and ultimately improve the security posture of LLM applications. You can read our group charter for more information

    OWASP Top 10 for LLM version 1.0

    LLM01: Prompt Injection
    This manipulates a large language model (LLM) through crafty inputs, causing unintended actions by the LLM. Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources.

    LLM02: Insecure Output Handling
    This vulnerability occurs when an LLM output is accepted without scrutiny, exposing backend systems. Misuse may lead to severe consequences like XSS, CSRF, SSRF, privilege escalation, or remote code execution.

    LLM03: Training Data Poisoning
    This occurs when LLM training data is tampered, introducing vulnerabilities or biases that compromise security, effectiveness, or ethical behavior. Sources include Common Crawl, WebText, OpenWebText, & books.

    LLM04: Model Denial of Service
    Attackers cause resource-heavy operations on LLMs, leading to service degradation or high costs. The vulnerability is magnified due to the resource-intensive nature of LLMs and unpredictability of user inputs.

    LLM05: Supply Chain Vulnerabilities
    LLM application lifecycle can be compromised by vulnerable components or services, leading to security attacks. Using third-party datasets, pre-trained models, and plugins can add vulnerabilities.

    LLM06: Sensitive Information Disclosure
    LLM’s may inadvertently reveal confidential data in its responses, leading to unauthorized data access, privacy violations, and security breaches. It’s crucial to implement data sanitization and strict user policies to mitigate this.

    LLM07: Insecure Plugin Design
    LLM plugins can have insecure inputs and insufficient access control. This lack of application control makes them easier to exploit and can result in consequences like remote code execution.

    LLM08: Excessive Agency
    LLM-based systems may undertake actions leading to unintended consequences. The issue arises from excessive functionality, permissions, or autonomy granted to the LLM-based systems.

    LLM09: Overreliance
    Systems or people overly depending on LLMs without oversight may face misinformation, miscommunication, legal issues, and security vulnerabilities due to incorrect or inappropriate content generated by LLMs.

    LLM10: Model Theft
    This involves unauthorized access, copying, or exfiltration of proprietary LLM models. The impact includes economic losses, compromised competitive advantage, and potential access to sensitive information.

    Tags: automation bias, decoupling, spaces for evaluation, prompt injection, inadequate informing, Meno's Paradox

    July 31, 2023

    they answered the question
    This is partially about prompt engineering and partially about what a good essay or search does. More than answer a question, perhaps? (this is engaged with in the essay, though not to my liking). Grimm

    The linked essay includes a sentiment connected with a common theme that I think is unfounded: denying the thinking and rethinking involved in effective prompting or querying, and reformulating both, hence my tag: prompting is thinking too:

    there is something about clear writing that is connected to clear thinking and acting in the world
    I don’t think that prompting, in its various forms, encourages and supports the same exact thinking as writing, in its various forms, but we would be remiss not to recognize that significant thinking can and does take place in interacting with (and through) computational devices via UIs in different ways (across time). (The theme reminds me also of the old critique of written language itself—as relayed in Plato’s dialogues—. Such critiques were, also, both not entirely wrong and yet also very ungracious and conservative? (And it reminds me that literacy itself—reading and writing—is a technology incredibly unequally distributed, with massive implications.))
    @ianarawjo via Twitter on Jul 30, 2023

    “Then my daughter started refining her inputs, putting in more parameters and prompts. The essays got better, more specific, more pointed. Each of them now did what a good essay should do: they answered the question.”

    @CoreyRobin via Twitter on Jul 30, 2023

    I asked my 15-year-old to run through ChatGPT a bunch of take-home essay questions I asked my students this year. Initially, it seemed like I could continue the way I do things. Then my daughter refined the inputs. Now I see that I need to change course.


    Tags: prompt engineering, prompting is thinking too, on questions

    July 28, 2023

    The ultimate question
    @aravsrinivas via Twitter on Jul 24, 2023

    The ultimate question is what is the question. Asking the right question is hard. Even framing a question is hard. Hence why at perplexity, we don’t just let you have a chat UI. But actually try to minimize the level of thought needed to ask fresh or follow up questions.

    @mlevchin via Twitter on Jul 24, 2023

    In a post-AI world perhaps the most important skill will be knowing how to ask a great question, generalized to knowing how to think through exactly what you want [to know.]

    Tags: search is hard, query formulation, on questions

    Cohere's Coral
    @aidangomezzz via Twitter on Jul 25, 2023

    We’re excited to start putting Coral in the hands of users!

    Coral is “retrieval-first” in the sense it will reference and cite its sources when generating an answer.

    Coral can pull from an ecosystem of knowledge sources including Google Workspace, Office365, ElasticSearch, and many more to come.

    Coral can be deployed completely privately within your VPC, on any major cloud provider.

    @cohere via Twitter on Jul 25, 2023

    Today, we introduce Coral: a knowledge assistant for enterprises looking to improve the productivity of their most strategic teams. Users can converse with Coral to help them complete their business tasks.


    Coral is conversational. Chat is the interface, powered by Cohere’s Command model. Coral understands the intent behind conversations, remembers the history, and is simple to use. Knowledge workers now have a capable assistant that can research, draft, summarize, and more.

    Coral is customizable. Customers can augment Coral’s knowledge base through data connections. Coral has 100+ integrations to connect to data sources important to your business across CRMs, collaboration tools, databases, search engines, support systems, and more.

    Coral is grounded. Workers need to understand where information is coming from. To help verify responses, Coral produces citations from relevant data sources. Our models are trained to seek relevant data based on a user’s need (even from multiple sources).

    Coral is private. Companies that want to take advantage of business-grade chatbots must have them deployed in a private environment. The data used for prompting, and the Coral’s outputs, will not leave a company’s data perimeter. Cohere will support deployment on any cloud.

    Tags: retrieval-first, grounded, Cohere

    this data might be wrong

    Screenshot of Ayhan Fuat Çelik’s “The Fall of Stack Overflow” on Observable omitted. The graph in question has since been updated.

    @natfriedman via Twitter on Jul 26, 2023

    Why the precipitous sudden decline in early 2022? That first cliff has nothing to do with ChatGPT.

    I also think this data might be wrong. Doesn’t match SimilarWeb visit data at all

    Tags: Stack Overflow, website analytics

    Be careful of concluding
    @jeremyphoward via Twitter on Jul 25, 2023

    Be careful of concluding that “GPT 4 can’t do ” on the basis you tried it once and it didn’t work for you.

    See the thread below for two recent papers showing how badly this line of thinking can go wrong, and an interesting example.

    Tags: prompt engineering, capability determination

    ssr: attention span essay or keywords?
    @katypearce via Twitter on Jul 26, 2023

    Does anyone have a quick link to a meta-analysis or a really good scholarly-informed essay on what evidence we have on the effect of technology/internet/whatever on “attention span”? Alternatively, some better search keywords than “attention span” would help too. Thanks!

    Tags: social search request, keyword request

    @pchandrasekar via Twitter on Jul 27, 2023

    Today we officially launch the next stage of community and AI here at @StackOverflow: OverflowAI! Just shared the exciting news on the @WeAreDevs keynote stage. If you missed it, watch highlights of our announcements and visit https://stackoverflow.co/labs/.

    Tags: Stack Overflow, CGT

    Just go online and type in "how to kiss."
    Good Boys (2019), via Yarn
    We’re sorry. We just wanted to learn how to kiss.

    [ . . . ]

    Just go online and type in “how to kiss.”
    That’s what everyone does.

    Tags: search directive

    via answeroverflow.com on Jul 28, 2023

    Bringing your Discord channels to Google

    Answer Overflow is an open source project designed to bring discord channels to your favorite search engine. Set it up in minutes and bring discovery to your hidden content.

    Tags: social search, void filling

    via cs.berkeley.edu on Jul 28, 2023

    🦍 Gorilla: Large Language Model Connected with Massive APIs

    Gorilla is a LLM that can provide appropriate API calls. It is trained on three massive machine learning hub datasets: Torch Hub, TensorFlow Hub and HuggingFace. We are rapidly adding new domains, including Kubernetes, GCP, AWS, OpenAPI, and more. Zero-shot Gorilla outperforms GPT-4, Chat-GPT and Claude. Gorilla is extremely reliable, and significantly reduces hallucination errors.

    Tags: CGT

    via r/NoStupidQuestions on Jul 13, 2023

    What does it mean when people from Canada and US say chamungus in meetings?

    I am from slovenia and this week we have 5 people from US and toronto office visiting us for trainings. On monday when we were first shaking hands and getting to know each other before the meetings they would say something like “chamungus” or “chumungus” or something along those lines. I googled it but I never found out what it means. I just noticed they only say that word the first time they are meeting someone.

    Anyone know what it means or what it is for?

    Tags: googled it, social search, void filling

    July 17, 2023

    Everything Marie Haynes Knows About Google’s Quality Raters
    There’s been a flurry of commentary recently on Twitter about Google’s search quality raters…

    Marie Haynes on Jul 12, 2023

    Everything We Know About Google’s Quality Raters: Who They Are, What They Do, and What It Means for Your Site If They Visit
    The inner workings of Google’s search algorithm remain shrouded in secrecy, yet one important piece of the ranking puzzle involves an army of over 16,000 contractors known as quality raters. Just what do these raters evaluate when they visit websites, and how much influence do their judgements have over search rankings?


    Meisner, C., Duffy, B. E., & Ziewitz, M. (2022). The labor of search engine evaluation: Making algorithms more human or humans more algorithmic? New Media & Society, 0(0), 14614448211063860. https://doi.org/10.1177/14614448211063860

    Tags: Google, Search Quality Raters, UCIS

    Simon Willison (@simonw) on misleading pretending re LLMs and reading links
    @simonw via Twitter on Jul 14, 2023

    Just caught Claude from @AnthropicAI doing the thing where it pretends to be able to read links you give it but actually just hallucinates a summary based on keywords in the URL - using https://claude.ai

    [tweeted image omitted]

    I wrote about how misleading it is when ChatGPT does this a few months ago:

    Simon Willison on Mar 10, 2023:
    ChatGPT can’t access the internet, even though it really looks like it can
    A really common misconception about ChatGPT is that it can access URLs. I’ve seen many different examples of people pasting in a URL and asking for a summary, or asking it to make use of the content on that page in some way.
    A few weeks after I first wrote this article, ChatGPT added a new alpha feature called “Browsing” mode. This alpha does have the ability to access content from URLs, but when it does so it makes it very explicit that it has used that ability, displaying additional contextual information [ . . . ]

    Tags: hallucination, Anthropic's Claude, OpenAI's ChatGPT

    Should we not "just google" phone numbers?
    @swiftonsecurity via Twitter on Jul 17, 2023

    My firm went through hell on earth to get our phone number on Google Maps updated. Google has malicious insider or a process has been hacked to get all these scammer replacements.

    @Shmuli via Twitter on Jul 17, 2023

    My (???) flight got canceled from JFK. The customer service line was huge, so I google a Delta JFK phone number. The number was 1888-571-4869 Thinking I reached Delta, I started telling them about getting me on a new flight.

    Tags: Google, do not just google

    July 11, 2023

    Claude 2 on my Claude Shannon hallucination test
    Reminder: I think “hallucination” of the sort I will show below is largely addressable with current technology. But, to guide our practice, it is useful to remind ourselves of where it has not yet been addressed.
    @AnthropicAI via Twitter on Jul 11, 2023

    Introducing Claude 2! Our latest model has improved performance in coding, math and reasoning. It can produce longer responses, and is available in a new public-facing beta website at http://claude.ai in the US and UK.

    Tags: hallucination, Anthropic's Claude

    July 10, 2023

    "tap the Search button twice"
    @nadaawg via Threads on Jul 6, 2023

    But what about that feature where you tap the Search button twice and it pops open the keyboard?

    @spotify way ahead of the curve


    A Spotify mobile app search screen showing explore options. Screenshot taken manually on iOS at roughly: 2023-07-10 09:40


    A Spotify mobile app search screen showing keyboard ready to afford typing. Screenshot taken manually on iOS at roughly: 2023-07-10 09:40

    Tags: micro interactions in search

    July 7, 2023

    GenAI "chat windows"

    What are good (and efficient) alternatives to ChatGPT *for writing code* or coding-related topics?

    So not asking about Copilot alternatives. But GenAI “chat windows” that have been trained on enough code to be useful in e.g. scaffolding, explaining coding concepts etc.

    On Twitter Jul 5, 2023

    Tags: CGT

    "I wish I could ask it to narrow search results to a given time period"

    Thanks for the recommendation, it’s actually great for searching! I wish I could ask it to narrow search results to a given time period though (cc @perplexity_ai)

    On Twitter Jul 7, 2023

    Tags: temporal searching, Perplexity AI

    July 6, 2023

    Kagi and generative search
    Kagi is building a novel ad-free, paid search engine and a powerful web browser as a part of our mission to humanize the web.
    Kagi: Kagi’s approach to AI in search

    Kagi Search is pleased to announce the introduction of three AI features into our product offering.

    We’d like to discuss how we see AI’s role in search, what are the challenges and our AI integration philosophy. Finally, we will be going over the features we are launching today.

    on the open Web Mar 16, 2023

    Tags: generative search, Kagi

    July 5, 2023

    [Please summarize Claude E. Shannon's "A Short History of Searching" (1948).]

    The comment below prompted me to do a single-query prompt test for "hallucination" across various tools. Results varied. Google's Bard and base models of OpenAI's ChatGPT and others failed to spot the imaginary reference. You.com, Perplexity AI, Phind, and ChatGPT-4 were more successful.

    I continue to be impressed by Phind's performance outside of coding questions (their headline is "The AI search engine for developers").

    @anthonymoser via Bluesky on Jul 4, 2023

    I'm imagining an instructor somewhere making a syllabus with chat gpt, assigning reading from books that don't exist

    But the students don't notice, because they are asking chat gpt to summarize the book or write the essay

  • I generally think addressing hallucination of this second sort (summarizing fake papers) is low-hanging fruit. The remedies seem straight forward (though not free) and the incentives appear to be well-aligned.
  • But I was surprised at how poorly ChatGPT performed on a simplistic mock-attempt at the student prompt here. Running on other tools was also pretty disappointing.
  • Granted, models may perform worse if the title itself were hallucinated. It is likely the author-and-title tested below title is somewhat in their hallucinatory-space, whereas other titles may not be. For instance, ChatGPT correctly noted that neither Stephen Hawking nor Plato had a piece by that title
  • See also
    ChatGPT [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A ChatGPT.GPT-4[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:06:36
    I conducted a follow-on test today and ChatGPT 3.5 still failed:
    "A Short History of Searching" is an influential paper written by Claude E. Shannon in 1948. In this paper, Shannon provides a historical overview of searching techniques and the development of information retrieval systems.

    Note: While Andi does take the paper title at face value, it does not hallucinate the contents of such a paper.
    Andi [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Andi[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:32:24

    Bard [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Bard[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:16:40

    Note: Perplexity AI takes the paper title at face value and hallucinates only briefly the contents before expanding on other work. (In a follow-on test (after querying Perplexity AI's Copilot), to account for my misordered test of You.com & You.com's GPT-4 version, does better at indicated the reference may be imaginary: Claude E. Shannon's "A Short History of Searching" is not mentioned in the search results....)
    Perplexity AI [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Perplexity AI[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:15:29

    Inflection AI Pi [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Inflection AI Pi[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:35:49 [screenshot manually trimmed to remove excess blankspace]

    Yes, even the namesake model struggles here.

    via Quora's Poe

    Claude Instant [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Claude Instant[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:35:16 [screenshot manually trimmed to remove excess blankspace]

    ✅ Note: I messed up this test. The timestamp for the base model search on You.com is _after_ my search on the GPT-4 model. It is possible that their base model draws on a database of previous responses from the better model.
    You.com [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A You.com[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-05 11:22:19

    ✅ Note: While I believe GPT-4 was selected when I submitted the query, I am not sure (given it can be toggled mid-conversation?).
    You.com.GPT-4 [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A You.com.GPT-4[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:14:49

    Note: This is omitting the Copilot interaction where I was told-and-asked "It seems there might be a confusion with the title of the paper. Can you please confirm the correct title of the paper by Claude E. Shannon you are looking for?" I responded with the imaginary title again.
    Perplexity AI.Copilot [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Perplexity AI.Copilot[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:39:13

    Phind [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A Phind[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-04 23:37:20

    ChatGPT.GPT-4 [ Please summarize Claude E. Shannon's "A Short History of Searching" (1948). ]
    A ChatGPT.GPT-4[Please summarize Claude E. Shannon's "A Short History of Searching" (1948).] search. Screenshot taken with GoFullPage (distortions possible) at: 2023-07-05 11:16:03

    Tags: hallucination, comparing-results, imaginary references, Phind, Perplexity AI, You.com, Andi, Inflection AI Pi, Google's Bard, OpenAI's ChatGPT, Anthropic's Claude

    June 30, 2023

    "the text prompt is a poor UI"

    This tweet is a reply—from the same author—to the tweet in: very few worthwhile tasks? (weblink).

    [highlighting added]


    In other words, I think the text prompt is a poor UI, quite separate to the capability of the model itself.

    On Twitter Jun 29, 2023

    Tags: text-interface

    June 29, 2023

    all you need is Sourcegraph's Cody?


    You’re all set

    Once embeddings are finished being generated, you can specify Cody’s context and start asking questions in the Cody Chat.

    Current status: “Generating repositories embeddings”


    I’m excited to announce that Cody is here for everyone. Cody can explain, diagnose, and fix your code like an expert, right in your IDE. No code base is too challenging for Cody.

    It’s like having your own personal team of senior engineers. Try it out!

    Tweet Jun 28, 2023

    Added: 2023-06-30 16:18:08

    Current status: “Generating repositories embeddings”


    Looking forward to it! If the tool is in beta, I might consider saying that more prominently. Neither Steve’s post nor the Sourcegraph website make that clear. I only just found “Cody AI is in beta” as a sentence in the VSCode plugin README.

    Tweet Jun 29, 2023

    Tags: CGT, Sourcegraph

    very few worthwhile tasks?

    What is a “worthwhile task”?

    [highlighting added]


    The more I look at chatGPT, the more I think that the fact NLP didn’t work very well until recently blinded us to the fact that very few worthwhile tasks can be described in 2-3 sentences typed in or spoken in one go. It’s the same class of error as pen computing.

    On Twitter Jun 29, 2023


    Reddy, M. J. (1979). The conduit metaphor: A case of frame conflict in our language about language. In A. Ortony (Ed.), Metaphor and thought. Cambridge University Press. https://www.reddyworks.com/the-conduit-metaphor/original-conduit-metaphor-article

    Tags: decontextualized, queries-and-prompts, extending searching

    definitions of prompt engineering evolving and shapeshifting

    the definition(s) and use(s) of “prompt engineering” will continue to evolve, shapeshift, fragment and become even more multiple and context-dependent. But still a useful handle?

    [highlighting added]


    1. i didnt look in details yet but this is roughly what i’d imagine a chaining tool api to look like (ahm langchain, ahm).
    2. its interesting how the definition of “prompt engineering” evolves and shapeshifts all the time.


    Why prompt engineer @openai with strings?

    Don’t even make it string, or a dag, make it a pipeline.

    Single level of abstraction:

    Tool and Prompt and Context and Technique? its the same thing, it a description of what I want.

    The code is the prompt. None of this shit
    “{}{{}} {}}”.format{“{}{}”

    PR in the next tweet.

    Tweet from @jxnlco Tweet Jun 29, 2023

    Tweet Jun 29, 2023

    Tags: prompt engineering

    June 27, 2023

    imagining OpenGoogle?
    @generativist via Twitter on Jun 26, 2023

    i imagine there’s no alpha left in adding the word “open” to various names anymore, right?

    Tags: speculative-design

    Perplexity, Ads, and SUVs

    I don’t think ads[1] are necessarily wrong to have in search results (despite the misgivings in Brin & Page (1998)), but people are definitely not happy with how the dominant search engine has done ads.

    1. relevant, clearly labelled, and fair (as in not unfair in the FTC sense)

    It is pretty striking to me how text-heavy Perplexity AI’s SERP is for this query: “highly-rated” x10?

    My experience has generally been much better, but I’m not normally doing queries like this.

    Here’s a link to the same query as that in the screenshot below (which, is likely not using their Copilot):

    Perplexity AI [ I want to buy a new SUV which brand is best? ]

    • note also the generated follow-on prompts under Related

    Seven reasons why perplexity.ai is better than Google search:

    1. No ads.
    2. All the content in one place.
    3. No ads.
    4. You can chat with it and get additional details.
    5. No ads.
    6. Sources are provided with URLs.
    7. No ads.

    Here’s a screenshot of car reviews, as just one of infinite examples. Perplexity is focused on being the search tool in the age of AI.

    I saw a demo from @AravSrinivas at the Synthedia conference hosted by @bretkinsella. I’ll have closing remarks.

    Example search results for Perplexity AI On Twitter Jun 27, 2023


    Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30, 107–117. http://www-db.stanford.edu/~backrub/google.html

    Tags: ads, Perplexity AI