Macpreneur

Unlock the Power of AI Beyond Text and Images

December 28, 2023 Damien Schreurs: Certified Apple Teacher | Business owner | Solopreneur | Trainer | Coach Season 3 Episode 77
Unlock the Power of AI Beyond Text and Images
Macpreneur
More Info
Macpreneur
Unlock the Power of AI Beyond Text and Images
Dec 28, 2023 Season 3 Episode 77
Damien Schreurs: Certified Apple Teacher | Business owner | Solopreneur | Trainer | Coach

This episode explores the various applications of AI beyond text and images. It delves into the world of AI-generated videos, audio, and programming assistance.

Visit macpreneur.com/community to get the chance to become one of the founding members of the Macpreneur community.

A video version and all the links are available at macpreneur.com/episode77

Takeaways:

  1. AI tools like HeyGen, Runway Gen-2, and Stable Video Diffusion enable the creation of AI-generated videos.
  2. Speech synthesis tools like Descript, OpenAI TTS, the Voices App and ElevenLabs offer stock AI voices and voice cloning capabilities.
  3. AI music generation tools like AIVA and Lyria provide unique features for composing music.
  4. GitHub Copilot is a powerful AI assistant for programmers, offering code completion, review, documentation, and more.
  5. Screenshot to Code app converts website screenshots into HTML, CSS, React, and Bootstrap code.
  6. TraceAI speeds up iOS app prototype creation by converting text ideas into SwiftUI-based applications.
  7. AI has the potential to revolutionize the way we work and create by offering endless possibilities.
  8. Embracing AI in creative projects can enhance productivity and open up new avenues for innovation.

Want to get personalized time-saving tips to be more efficient on your Mac?

Answer a few questions about how you're currently dealing with unnecessary clicks, repetitive typing and file clutter. It's FREE and takes less than 2 minutes!
https://macpreneur.com/tips

Wondering where to start streamlining your solo business?

Kickstart your unique journey with a 360° Tech Diagnostic
https://macpreneur.com/diagnostic

Macpreneur Community Waitlist

Become one of the founding members of the Macpreneur community!
https://macpreneur.com/community

Follow me:

Show Notes Transcript Chapter Markers

This episode explores the various applications of AI beyond text and images. It delves into the world of AI-generated videos, audio, and programming assistance.

Visit macpreneur.com/community to get the chance to become one of the founding members of the Macpreneur community.

A video version and all the links are available at macpreneur.com/episode77

Takeaways:

  1. AI tools like HeyGen, Runway Gen-2, and Stable Video Diffusion enable the creation of AI-generated videos.
  2. Speech synthesis tools like Descript, OpenAI TTS, the Voices App and ElevenLabs offer stock AI voices and voice cloning capabilities.
  3. AI music generation tools like AIVA and Lyria provide unique features for composing music.
  4. GitHub Copilot is a powerful AI assistant for programmers, offering code completion, review, documentation, and more.
  5. Screenshot to Code app converts website screenshots into HTML, CSS, React, and Bootstrap code.
  6. TraceAI speeds up iOS app prototype creation by converting text ideas into SwiftUI-based applications.
  7. AI has the potential to revolutionize the way we work and create by offering endless possibilities.
  8. Embracing AI in creative projects can enhance productivity and open up new avenues for innovation.

Want to get personalized time-saving tips to be more efficient on your Mac?

Answer a few questions about how you're currently dealing with unnecessary clicks, repetitive typing and file clutter. It's FREE and takes less than 2 minutes!
https://macpreneur.com/tips

Wondering where to start streamlining your solo business?

Kickstart your unique journey with a 360° Tech Diagnostic
https://macpreneur.com/diagnostic

Macpreneur Community Waitlist

Become one of the founding members of the Macpreneur community!
https://macpreneur.com/community

Follow me:

MP077 - Unlock the Power of AI Beyond Text and Images


Teaser

Did you know that AI can do much more than just generate text and images? In this episode, I'll explore how AI can help generate videos, audio, and even help with programming tasks. 

Stay tuned until the end as you'll discover a mind blowing AI tool that can generate prototypes of iPhone apps without needing any programming knowledge.

I'll unpack all of this after the intro.


Welcome

If this is the first episode that you're listening to, welcome to the Macpreneur tribe, and if you're a longtime Macpreneur listener, thank you for tuning back in. As a fellow solopreneur, I appreciate that you dedicate these 15 ish minutes with me every week. Over the past few months, I've had a chance to interact with other Macpreneurs, and some of them expressed interest in being able to connect and discuss with other solopreneurs who run their businesses on their Mac.

Now, before launching a Macpreneur community, I would like to be sure that enough of you actually want that. So if this idea sounds interesting, then head on to macpreneur.com/community, where you will be able to join the wait list. Just enter your name, email address, and pick all the online platforms that you prefer using.

And so to get the chance to become one of the founding members of the Macpreneur community, just macpreneur.com/community


Video generation

So creating images is something that can be done quite easily now with AI, but generating videos is another thing entirely. 

So, if that's your thing, there are three tools to watch at the moment.

The first one is called HeyGen, and it's capable of creating AI avatars and voices. The second one is Runway Gen-2, and the last one is Stable Video Diffusion. 


HeyGen

So, HeyGen, has the ability to provide Prebuilt Human Avatars. So it's great for explainers and tutorials. It's also multilingual, and they offer the capability to clone oneself, so both the video and the voice.


Runway Gen-2

Another one is Runway Gen 2. This one is different in the sense that the goal is to create videos and you can do that from different kind of prompts. 

You can, write some text. You can also combine text and images, and that would be the, the prompt for creating the video.

You can only use an image if you want. 

And there is also an image plus video mode. Which means you give it an image, you give it a video, and it'll combine both to create a new video. 

They even offer a storyboard option, so you can create mockups, and then from the mockup, it'll then create a video based on the text prompt.


Stable Video Diffusion

The last one is Stable video diffusion. It's coming from the same people who developed Stable Diffusion, so it's Stability AI. 

And it's an open source model, but at the moment it is limited to 5 seconds of video.

It is resource intensive, it's something that you can run then locally on a computer, and if you imagine that you have 30 frames per second and 5 seconds of video, it needs to generate 150 images successively, that actually match when you play them together. 


Audio generation

On the audio side of things, there are two main areas, creating speech and generating music.

On the speech synthesis side, I want to highlight four solutions, Descript, an OpenAI model called Text to Speech, the Voices App, and then a service called ElevenLabs. 


Descript

So Descript I've already mentioned and talked about that in previous episodes. I'm using that to edit a podcast, but it also offers stock AI voices.

At the beginning of season two and almost up to, uh, I would say the middle of season three, the intro and the outro of the Macpreneur podcast were using the stock AI voice called Don. 

And on top of that, it's possible to clone, your own voice. I have the Creator plan and, the vocabulary is limited to 1000 words, meaning that, , I cannot, correct everything with voice cloning. 

If the, the words that I try to correct are outside of the vocabulary, Descript will replace the sound by something like jib. 

From the pro plan that is at $30 per month then it's possible to have unlimited vocabulary for voice cloning.


OpenAI TTS

And then OpenAI released a model called TTS for text to speech. It's the model that they use when you interact with the voice. Uh, with ShadyPT, so the, the mobile application, but it's also possible to invoke that model through the API, provided that you have an API key.

Now, having an API key is good, but then you need an application that will interact with the API. 


Voices app

and the good news is that the developer of MacWhisper and MacGPT is offering an app called Voices.

It requires macOS 13 Ventura or later, and obviously you need to provide your OpenAI API key, but it's a very nice user interface. 

So you type your text, then you choose among six of the available voices. You decide the quality of the audio and then you can decide the output format. I'm choosing mp3. 

For the newest. Uh, Macpreneur intro and outro jingle, I'm using the voice Nova and I've created actually those intro and outro using the Voices app on my MacBook Pro.

It's very quick and very inexpensive. To give you an order of magnitude, the intro and the outro, in total, it's roughly 500 characters, and it cost me less than one cent. 


ElevenLabs

The last service is called Eleven Labs. It's a freemium type of service. It offers text to speech with their own stock AI voices. 

Like Descript, you can do voice cloning.

They offer now speech to speech. So you give them an audio and they can create another audio. So changing either the voice or the language. They actually have 29 languages that they support. 

The only downside is that if you want to use that service for commercial use, you need a paid account.

On the music generation side, I have selected two services, AIVA and Lyria. 


AIVA

AIVA A I V A, is a Luxembourgish startup. before COVID, I attended an ICT conference and they were one of the startups that were doing a pitch competition and actually won first prize. 

It's possible with AIVA to actually compose music and it's used in the movie industry.

So for movies, they can generate, uh, scores and yeah, it's a company that is really at the forefront of. AI Music Generation. 


Lyria by Google's Deepmind

And then Google's Deepmind division has released a model called Lyria, And with that model, they offer two solutions. 

One solution is called Dream Track and it can be used to accompany shorts, YouTube shorts. It's a text to music generation tool and you can select the style of popular artists who have licensed their voice to Google.

And then the other one is called Music AI Tools for YouTube. 

It's a set of tools for musicians. It can, for instance, convert a hummed melody into a saxophone, or you can transform beatboxing into a drum loop. So it's, help for musicians to quickly, compose music. 


AI for coding

On the coding side, there are three services to watch for. 

The first one is GitHub Copilot, the second one is Screenshot to Code, and the third one is Trace AI. 


GitHub Copilot

So, GitHub Copilot is for programmers who use GitHub as their repository for their source code. 

It can do code completion, code review, it can also create documentation, unit testing. 

So if you are a developer and you have never used Copilot, I can only encourage you to to check it out. 


Screenshot-to-code

And then screenshot-to-code. 

This is a terrific app that converts a screenshot of an existing website and then from the screenshot it will recreate the HTML code, the CSS, but it's also compatible with React and Bootstrap.

And basically it's using now the latest GPT 4 vision model to actually understand what it sees in the screenshot and then generates the code. 

It's using also DALL-E 3 to generate images, if the website had images. And now you can even enter a URL. You don't need a screenshot at all. You just enter the URL and it will clone the website that is behind the URL.


Trace AI

So the last one is TraceAI. 

It's a way to quickly convert ideas from text to a iOS application using SwiftUI. 

This application allows to export an Xcode project, and you can even run a test version directly on your iPhone. 

With TraceAI, I've been able to create a prototype of an iOS app that I had in mind, and I was able to do that in roughly two hours, and I have absolutely no training on SwiftUI.

It's not perfect yet. At one point The code didn't work, so I was still able to go into the code, and because I know a little bit of programming, even though I don't know SwiftUI, I know the basics of programming, I was able to spot the mistake, there was a parenthesis missing somewhere, so I just typed the parenthesis, and then the application ran inside the browser.


Recap

To recap, I've explored a bunch of AI tools that can help generate videos, speech, and music, As well as help programmers and non programmers alike to turn their ideas into websites and applications. If you found this episode useful, I'd be super grateful if you could share it, and if you do that on Instagram, you can tag me, my handle is at MacpreneurFM.


Next & outro

In the next episode, I will explore three trends that I believe will make AI even more useful and relevant. than it has been in 2023. 

So that's it for today. If you haven't done it yet, visit macpreneur.com/ai to grab your own copy of the top 10 AI tools cheat sheet. 

This PDF will give you the edge that you need to boost your solo business in this fast pacing world.

Once again, it's macpreneur.com/ai

And until next time, I'm Damien Schreurs, wishing you a great day.


Teaser
Welcome
Video generation
HeyGen
Runway Gen-2
Stable Video Diffusion
Audio generation
Descript
OpenAI TTS
Voices app
ElevenLabs
AIVA
Lyria by Google's Deepmind
AI for coding
GitHub Copilot
Screenshot-to-code
Trace AI
Recap
Next & outro